Building wheel for flash-attn (setup.py) ... error

Having issue installing using pypi pip install:

Mainly I am on windows and I think flash-attn is for linux? Either way, its not building correctly and wasnt sure if there was a work around for this issue? I am using the newest build thats there.

Using cached marisa_trie-1.3.1-cp310-cp310-win_amd64.whl (143 kB)
Building wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [353 lines of output]


      torch.__version__  = 2.8.0+cu126


      C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\__init__.py:88: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
      !!

              ********************************************************************************
              Requirements should be satisfied by a PEP 517 installer.
              If you are using pip, you can try `pip install --use-pep517`.
              ********************************************************************************

      !!
        dist.fetch_build_eggs(dist.setup_requires)
      running bdist_wheel
      Guessing wheel URL:  https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp310-cp310-win_amd64.whl
      Precompiled wheel not found. Building from source...
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-310
      creating build\lib.win-amd64-cpython-310\flash_attn
      copying flash_attn\bert_padding.py -> build\lib.win-amd64-cpython-310\flash_attn
      copying flash_attn\flash_attn_interface.py -> build\lib.win-amd64-cpython-310\flash_attn
      copying flash_attn\flash_attn_triton.py -> build\lib.win-amd64-cpython-310\flash_attn
      copying flash_attn\flash_attn_triton_og.py -> build\lib.win-amd64-cpython-310\flash_attn
      copying flash_attn\flash_blocksparse_attention.py -> build\lib.win-amd64-cpython-310\flash_attn
      copying flash_attn\flash_blocksparse_attn_interface.py -> build\lib.win-amd64-cpython-310\flash_attn
      copying flash_attn\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn
      creating build\lib.win-amd64-cpython-310\hopper
      copying hopper\benchmark_attn.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\benchmark_flash_attention_fp8.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\benchmark_mla_decode.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\benchmark_split_kv.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\flash_attn_interface.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\generate_kernels.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\padding.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\setup.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\test_attn_kvcache.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\test_flash_attn.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\test_kvcache.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\test_util.py -> build\lib.win-amd64-cpython-310\hopper
      copying hopper\__init__.py -> build\lib.win-amd64-cpython-310\hopper
      creating build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\ampere_helpers.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\blackwell_helpers.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\block_info.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\fast_math.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\flash_bwd.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\flash_bwd_postprocess.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\flash_bwd_preprocess.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\flash_fwd.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\flash_fwd_sm100.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\hopper_helpers.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\interface.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\mask.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\mma_sm100_desc.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\named_barrier.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\pack_gqa.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\pipeline.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\seqlen_info.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\softmax.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\tile_scheduler.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\utils.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      copying flash_attn\cute\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
      creating build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\bench.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\bwd_prefill.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\bwd_prefill_fused.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\bwd_prefill_onekernel.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\bwd_prefill_split.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\bwd_ref.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\fp8.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\fwd_decode.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\fwd_prefill.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\fwd_ref.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\interface_fa.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\test.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\train.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\utils.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      copying flash_attn\flash_attn_triton_amd\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
      creating build\lib.win-amd64-cpython-310\flash_attn\layers
      copying flash_attn\layers\patch_embed.py -> build\lib.win-amd64-cpython-310\flash_attn\layers
      copying flash_attn\layers\rotary.py -> build\lib.win-amd64-cpython-310\flash_attn\layers
      copying flash_attn\layers\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\layers
      creating build\lib.win-amd64-cpython-310\flash_attn\losses
      copying flash_attn\losses\cross_entropy.py -> build\lib.win-amd64-cpython-310\flash_attn\losses
      copying flash_attn\losses\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\losses
      creating build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\baichuan.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\bert.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\bigcode.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\btlm.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\falcon.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\gpt.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\gptj.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\gpt_neox.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\llama.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\opt.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\vit.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      copying flash_attn\models\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\models
      creating build\lib.win-amd64-cpython-310\flash_attn\modules
      copying flash_attn\modules\block.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
      copying flash_attn\modules\embedding.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
      copying flash_attn\modules\mha.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
      copying flash_attn\modules\mlp.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
      copying flash_attn\modules\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
      creating build\lib.win-amd64-cpython-310\flash_attn\ops
      copying flash_attn\ops\activations.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
      copying flash_attn\ops\fused_dense.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
      copying flash_attn\ops\layer_norm.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
      copying flash_attn\ops\rms_norm.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
      copying flash_attn\ops\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
      creating build\lib.win-amd64-cpython-310\flash_attn\utils
      copying flash_attn\utils\benchmark.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
      copying flash_attn\utils\distributed.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
      copying flash_attn\utils\generation.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
      copying flash_attn\utils\library.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
      copying flash_attn\utils\pretrained.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
      copying flash_attn\utils\testing.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
      copying flash_attn\utils\torch.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
      copying flash_attn\utils\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
      creating build\lib.win-amd64-cpython-310\flash_attn\ops\triton
      copying flash_attn\ops\triton\cross_entropy.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
      copying flash_attn\ops\triton\k_activations.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
      copying flash_attn\ops\triton\layer_norm.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
      copying flash_attn\ops\triton\linear.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
      copying flash_attn\ops\triton\mlp.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
      copying flash_attn\ops\triton\rotary.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
      copying flash_attn\ops\triton\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
      running build_ext
      W1106 16:57:10.125498 19476 site-packages\torch\utils\cpp_extension.py:466] Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
      building 'flash_attn_2_cuda' extension
      creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310
      creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release
      creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc
      creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc\flash_attn
      creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc\flash_attn\src
      W1106 16:57:14.979290 19476 site-packages\torch\utils\cpp_extension.py:466] Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
      [1/73] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
      FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj
      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
      flash_bwd_hdim128_bf16_causal_sm80.cu
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
      flash_bwd_hdim128_bf16_causal_sm80.cu
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
      flash_bwd_hdim128_bf16_causal_sm80.cu
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
      flash_bwd_hdim128_bf16_causal_sm80.cu
      C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: namespace "cutlass::platform" has no member "is_unsigned_v"
          static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                           ^

      C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: type name is not allowed
          static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                                         ^

      C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: expected an expression
          static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                                                 ^

      3 errors detected in the compilation of "C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.cu".

      [2/73] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
      FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj
      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
      flash_bwd_hdim128_bf16_sm80.cu
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
      flash_bwd_hdim128_bf16_sm80.cu
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
      flash_bwd_hdim128_bf16_sm80.cu
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
      flash_bwd_hdim128_bf16_sm80.cu
      C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: namespace "cutlass::platform" has no member "is_unsigned_v"
          static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                           ^

      C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: type name is not allowed
          static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                                         ^

      C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: expected an expression
          static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                                                 ^

      3 errors detected in the compilation of "C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu".

      [3/73] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_fp16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
      FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj
      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_fp16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
      flash_bwd_hdim128_fp16_causal_sm80.cu
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
      flash_bwd_hdim128_fp16_causal_sm80.cu
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
      flash_bwd_hdim128_fp16_causal_sm80.cu
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
      cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
      flash_bwd_hdim128_fp16_causal_sm80.cu
      C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: namespace "cutlass::platform" has no member "is_unsigned_v"
          static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                           ^

      C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: type name is not allowed
          static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                                         ^

      C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: expected an expression
          static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");

                                                                 ^



      3 errors detected in the compilation of "C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.cu".

      [4/73] cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\flash_api.cpp /FoC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/flash_api.obj -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda /std:c++17
      FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/flash_api.obj
      cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\flash_api.cpp /FoC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/flash_api.obj -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda /std:c++17
      cl : Command line warning D9002 : ignoring unknown option '-O3'
      cl : Command line warning D9002 : ignoring unknown option '-std=c++17'
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2039: 'is_unsigned_v': is not a member of 'cutlass::platform'
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/integer_subbyte.h(235): note: see declaration of 'cutlass::platform'
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): note: the template instantiation context (the oldest one first) is
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(936): note: while compiling class template 'cutlass::float_exmy_base'
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(950): note: see reference to function template instantiation 'auto cutlass::detail::fp_encoding_selector<cutlass::detail::FpEncoding::E8M23>(void)' being compiled
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(860): note: see reference to class template instantiation 'cutlass::detail::FpBitRepresentation<uint32_t,32,8,23,cutlass::detail::NanInfEncoding::IEEE_754,true>' being compiled
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2065: 'is_unsigned_v': undeclared identifier
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint32_t,32,8,23,cutlass::detail::NanInfEncoding::IEEE_754,true>::Storage': expected an expression instead of a type
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2059: syntax error: ','
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2238: unexpected token(s) preceding ';'
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,8,4,3,cutlass::detail::NanInfEncoding::CANONICAL_ONLY,false>::Storage': expected an expression instead of a type
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,8,8,0,cutlass::detail::NanInfEncoding::CANONICAL_ONLY,false>::Storage': expected an expression instead of a type
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,4,2,1,cutlass::detail::NanInfEncoding::NONE,true>::Storage': expected an expression instead of a type
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,6,2,3,cutlass::detail::NanInfEncoding::NONE,true>::Storage': expected an expression instead of a type
      C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,6,3,2,cutlass::detail::NanInfEncoding::NONE,true>::Storage': expected an expression instead of a type
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\setup.py", line 486, in run
          urllib.request.urlretrieve(wheel_url, wheel_filename)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 241, in urlretrieve
          with contextlib.closing(urlopen(url, data)) as fp:
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 216, in urlopen
          return opener.open(url, data, timeout)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 525, in open
          response = meth(req, response)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 634, in http_response
          response = self.parent.error(
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 563, in error
          return self._call_chain(*args)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 496, in _call_chain
          result = func(*args)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 643, in http_error_default
          raise HTTPError(req.full_url, code, msg, hdrs, fp)
      urllib.error.HTTPError: HTTP Error 404: Not Found

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2595, in _run_ninja_build
          subprocess.run(
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 524, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '4']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\setup.py", line 526, in <module>
          setup(
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\__init__.py", line 111, in setup
          return distutils.core.setup(**attrs)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\core.py", line 184, in setup
          return run_commands(dist)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\core.py", line 200, in run_commands
          dist.run_commands()
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 964, in run_commands
          self.run_command(cmd)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\dist.py", line 948, in run_command
          super().run_command(command)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 983, in run_command
          cmd_obj.run()
        File "C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\setup.py", line 503, in run
          super().run()
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\command\bdist_wheel.py", line 384, in run
          self.run_command("build")
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\dist.py", line 948, in run_command
          super().run_command(command)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 983, in run_command
          cmd_obj.run()
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build.py", line 135, in run
          self.run_command(cmd_name)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\dist.py", line 948, in run_command
          super().run_command(command)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 983, in run_command
          cmd_obj.run()
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\command\build_ext.py", line 96, in run
          _build_ext.run(self)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\Cython\Distutils\old_build_ext.py", line 186, in run
          _build_ext.build_ext.run(self)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 359, in run
          self.build_extensions()
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1072, in build_extensions
          build_ext.build_extensions(self)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\Cython\Distutils\old_build_ext.py", line 195, in build_extensions
          _build_ext.build_ext.build_extensions(self)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 476, in build_extensions
          self._build_extensions_serial()
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 502, in _build_extensions_serial
          self.build_extension(ext)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\command\build_ext.py", line 257, in build_extension
          _build_ext.build_extension(self, ext)
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 557, in build_extension
          objects = self.compiler.compile(
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1041, in win_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2227, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2612, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for flash-attn
  Running setup.py clean for flash-attn
Failed to build flash-attn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Building wheel for flash-attn (setup.py) ... error #101

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Building wheel for flash-attn (setup.py) ... error #101

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions