Skip to content

Building wheel for flash-attn (setup.py) ... error #101

@Mradr

Description

@Mradr

Having issue installing using pypi pip install:

Mainly I am on windows and I think flash-attn is for linux? Either way, its not building correctly and wasnt sure if there was a work around for this issue? I am using the newest build thats there.

Using cached marisa_trie-1.3.1-cp310-cp310-win_amd64.whl (143 kB)
Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [353 lines of output]

  torch.__version__  = 2.8.0+cu126


  C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\__init__.py:88: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
  !!

          ********************************************************************************
          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try `pip install --use-pep517`.
          ********************************************************************************

  !!
    dist.fetch_build_eggs(dist.setup_requires)
  running bdist_wheel
  Guessing wheel URL:  https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp310-cp310-win_amd64.whl
  Precompiled wheel not found. Building from source...
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-cpython-310
  creating build\lib.win-amd64-cpython-310\flash_attn
  copying flash_attn\bert_padding.py -> build\lib.win-amd64-cpython-310\flash_attn
  copying flash_attn\flash_attn_interface.py -> build\lib.win-amd64-cpython-310\flash_attn
  copying flash_attn\flash_attn_triton.py -> build\lib.win-amd64-cpython-310\flash_attn
  copying flash_attn\flash_attn_triton_og.py -> build\lib.win-amd64-cpython-310\flash_attn
  copying flash_attn\flash_blocksparse_attention.py -> build\lib.win-amd64-cpython-310\flash_attn
  copying flash_attn\flash_blocksparse_attn_interface.py -> build\lib.win-amd64-cpython-310\flash_attn
  copying flash_attn\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn
  creating build\lib.win-amd64-cpython-310\hopper
  copying hopper\benchmark_attn.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\benchmark_flash_attention_fp8.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\benchmark_mla_decode.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\benchmark_split_kv.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\flash_attn_interface.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\generate_kernels.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\padding.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\setup.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\test_attn_kvcache.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\test_flash_attn.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\test_kvcache.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\test_util.py -> build\lib.win-amd64-cpython-310\hopper
  copying hopper\__init__.py -> build\lib.win-amd64-cpython-310\hopper
  creating build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\ampere_helpers.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\blackwell_helpers.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\block_info.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\fast_math.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\flash_bwd.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\flash_bwd_postprocess.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\flash_bwd_preprocess.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\flash_fwd.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\flash_fwd_sm100.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\hopper_helpers.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\interface.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\mask.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\mma_sm100_desc.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\named_barrier.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\pack_gqa.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\pipeline.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\seqlen_info.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\softmax.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\tile_scheduler.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\utils.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  copying flash_attn\cute\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
  creating build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\bench.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\bwd_prefill.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\bwd_prefill_fused.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\bwd_prefill_onekernel.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\bwd_prefill_split.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\bwd_ref.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\fp8.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\fwd_decode.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\fwd_prefill.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\fwd_ref.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\interface_fa.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\test.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\train.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\utils.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  copying flash_attn\flash_attn_triton_amd\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
  creating build\lib.win-amd64-cpython-310\flash_attn\layers
  copying flash_attn\layers\patch_embed.py -> build\lib.win-amd64-cpython-310\flash_attn\layers
  copying flash_attn\layers\rotary.py -> build\lib.win-amd64-cpython-310\flash_attn\layers
  copying flash_attn\layers\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\layers
  creating build\lib.win-amd64-cpython-310\flash_attn\losses
  copying flash_attn\losses\cross_entropy.py -> build\lib.win-amd64-cpython-310\flash_attn\losses
  copying flash_attn\losses\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\losses
  creating build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\baichuan.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\bert.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\bigcode.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\btlm.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\falcon.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\gpt.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\gptj.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\gpt_neox.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\llama.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\opt.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\vit.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  copying flash_attn\models\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\models
  creating build\lib.win-amd64-cpython-310\flash_attn\modules
  copying flash_attn\modules\block.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
  copying flash_attn\modules\embedding.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
  copying flash_attn\modules\mha.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
  copying flash_attn\modules\mlp.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
  copying flash_attn\modules\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
  creating build\lib.win-amd64-cpython-310\flash_attn\ops
  copying flash_attn\ops\activations.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
  copying flash_attn\ops\fused_dense.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
  copying flash_attn\ops\layer_norm.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
  copying flash_attn\ops\rms_norm.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
  copying flash_attn\ops\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
  creating build\lib.win-amd64-cpython-310\flash_attn\utils
  copying flash_attn\utils\benchmark.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
  copying flash_attn\utils\distributed.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
  copying flash_attn\utils\generation.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
  copying flash_attn\utils\library.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
  copying flash_attn\utils\pretrained.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
  copying flash_attn\utils\testing.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
  copying flash_attn\utils\torch.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
  copying flash_attn\utils\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
  creating build\lib.win-amd64-cpython-310\flash_attn\ops\triton
  copying flash_attn\ops\triton\cross_entropy.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
  copying flash_attn\ops\triton\k_activations.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
  copying flash_attn\ops\triton\layer_norm.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
  copying flash_attn\ops\triton\linear.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
  copying flash_attn\ops\triton\mlp.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
  copying flash_attn\ops\triton\rotary.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
  copying flash_attn\ops\triton\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
  running build_ext
  W1106 16:57:10.125498 19476 site-packages\torch\utils\cpp_extension.py:466] Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
  building 'flash_attn_2_cuda' extension
  creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310
  creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release
  creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc
  creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc\flash_attn
  creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc\flash_attn\src
  W1106 16:57:14.979290 19476 site-packages\torch\utils\cpp_extension.py:466] Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
  [1/73] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
  FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj
  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
  flash_bwd_hdim128_bf16_causal_sm80.cu
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
  flash_bwd_hdim128_bf16_causal_sm80.cu
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
  flash_bwd_hdim128_bf16_causal_sm80.cu
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
  flash_bwd_hdim128_bf16_causal_sm80.cu
  C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: namespace "cutlass::platform" has no member "is_unsigned_v"
      static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                       ^

  C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: type name is not allowed
      static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                                     ^

  C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: expected an expression
      static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                                             ^

  3 errors detected in the compilation of "C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.cu".

  [2/73] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
  FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj
  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
  flash_bwd_hdim128_bf16_sm80.cu
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
  flash_bwd_hdim128_bf16_sm80.cu
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
  flash_bwd_hdim128_bf16_sm80.cu
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
  flash_bwd_hdim128_bf16_sm80.cu
  C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: namespace "cutlass::platform" has no member "is_unsigned_v"
      static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                       ^

  C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: type name is not allowed
      static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                                     ^

  C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: expected an expression
      static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                                             ^

  3 errors detected in the compilation of "C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu".

  [3/73] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_fp16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
  FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj
  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_fp16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
  flash_bwd_hdim128_fp16_causal_sm80.cu
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
  flash_bwd_hdim128_fp16_causal_sm80.cu
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
  flash_bwd_hdim128_fp16_causal_sm80.cu
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
  cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
  flash_bwd_hdim128_fp16_causal_sm80.cu
  C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: namespace "cutlass::platform" has no member "is_unsigned_v"
      static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                       ^

  C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: type name is not allowed
      static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
                                                     ^

  C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: expected an expression
      static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");

                                                             ^



  3 errors detected in the compilation of "C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.cu".

  [4/73] cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\flash_api.cpp /FoC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/flash_api.obj -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda /std:c++17
  FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/flash_api.obj
  cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\flash_api.cpp /FoC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/flash_api.obj -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda /std:c++17
  cl : Command line warning D9002 : ignoring unknown option '-O3'
  cl : Command line warning D9002 : ignoring unknown option '-std=c++17'
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2039: 'is_unsigned_v': is not a member of 'cutlass::platform'
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/integer_subbyte.h(235): note: see declaration of 'cutlass::platform'
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): note: the template instantiation context (the oldest one first) is
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(936): note: while compiling class template 'cutlass::float_exmy_base'
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(950): note: see reference to function template instantiation 'auto cutlass::detail::fp_encoding_selector<cutlass::detail::FpEncoding::E8M23>(void)' being compiled
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(860): note: see reference to class template instantiation 'cutlass::detail::FpBitRepresentation<uint32_t,32,8,23,cutlass::detail::NanInfEncoding::IEEE_754,true>' being compiled
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2065: 'is_unsigned_v': undeclared identifier
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint32_t,32,8,23,cutlass::detail::NanInfEncoding::IEEE_754,true>::Storage': expected an expression instead of a type
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2059: syntax error: ','
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2238: unexpected token(s) preceding ';'
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,8,4,3,cutlass::detail::NanInfEncoding::CANONICAL_ONLY,false>::Storage': expected an expression instead of a type
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,8,8,0,cutlass::detail::NanInfEncoding::CANONICAL_ONLY,false>::Storage': expected an expression instead of a type
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,4,2,1,cutlass::detail::NanInfEncoding::NONE,true>::Storage': expected an expression instead of a type
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,6,2,3,cutlass::detail::NanInfEncoding::NONE,true>::Storage': expected an expression instead of a type
  C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,6,3,2,cutlass::detail::NanInfEncoding::NONE,true>::Storage': expected an expression instead of a type
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\setup.py", line 486, in run
      urllib.request.urlretrieve(wheel_url, wheel_filename)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 241, in urlretrieve
      with contextlib.closing(urlopen(url, data)) as fp:
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 216, in urlopen
      return opener.open(url, data, timeout)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 525, in open
      response = meth(req, response)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 634, in http_response
      response = self.parent.error(
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 563, in error
      return self._call_chain(*args)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 496, in _call_chain
      result = func(*args)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 643, in http_error_default
      raise HTTPError(req.full_url, code, msg, hdrs, fp)
  urllib.error.HTTPError: HTTP Error 404: Not Found

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2595, in _run_ninja_build
      subprocess.run(
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 524, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '4']' returned non-zero exit status 1.

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\setup.py", line 526, in <module>
      setup(
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\__init__.py", line 111, in setup
      return distutils.core.setup(**attrs)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\core.py", line 184, in setup
      return run_commands(dist)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\core.py", line 200, in run_commands
      dist.run_commands()
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 964, in run_commands
      self.run_command(cmd)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\dist.py", line 948, in run_command
      super().run_command(command)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 983, in run_command
      cmd_obj.run()
    File "C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\setup.py", line 503, in run
      super().run()
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\command\bdist_wheel.py", line 384, in run
      self.run_command("build")
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\dist.py", line 948, in run_command
      super().run_command(command)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 983, in run_command
      cmd_obj.run()
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build.py", line 135, in run
      self.run_command(cmd_name)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\dist.py", line 948, in run_command
      super().run_command(command)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 983, in run_command
      cmd_obj.run()
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\command\build_ext.py", line 96, in run
      _build_ext.run(self)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\Cython\Distutils\old_build_ext.py", line 186, in run
      _build_ext.build_ext.run(self)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 359, in run
      self.build_extensions()
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1072, in build_extensions
      build_ext.build_extensions(self)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\Cython\Distutils\old_build_ext.py", line 195, in build_extensions
      _build_ext.build_ext.build_extensions(self)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 476, in build_extensions
      self._build_extensions_serial()
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 502, in _build_extensions_serial
      self.build_extension(ext)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\command\build_ext.py", line 257, in build_extension
      _build_ext.build_extension(self, ext)
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 557, in build_extension
      objects = self.compiler.compile(
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1041, in win_wrap_ninja_compile
      _write_ninja_file_and_compile_objects(
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2227, in _write_ninja_file_and_compile_objects
      _run_ninja_build(
    File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2612, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions