-
-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
Having issue installing using pypi pip install:
Mainly I am on windows and I think flash-attn is for linux? Either way, its not building correctly and wasnt sure if there was a work around for this issue? I am using the newest build thats there.
Using cached marisa_trie-1.3.1-cp310-cp310-win_amd64.whl (143 kB)
Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [353 lines of output]
torch.__version__ = 2.8.0+cu126
C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\__init__.py:88: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try `pip install --use-pep517`.
********************************************************************************
!!
dist.fetch_build_eggs(dist.setup_requires)
running bdist_wheel
Guessing wheel URL: https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp310-cp310-win_amd64.whl
Precompiled wheel not found. Building from source...
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-310
creating build\lib.win-amd64-cpython-310\flash_attn
copying flash_attn\bert_padding.py -> build\lib.win-amd64-cpython-310\flash_attn
copying flash_attn\flash_attn_interface.py -> build\lib.win-amd64-cpython-310\flash_attn
copying flash_attn\flash_attn_triton.py -> build\lib.win-amd64-cpython-310\flash_attn
copying flash_attn\flash_attn_triton_og.py -> build\lib.win-amd64-cpython-310\flash_attn
copying flash_attn\flash_blocksparse_attention.py -> build\lib.win-amd64-cpython-310\flash_attn
copying flash_attn\flash_blocksparse_attn_interface.py -> build\lib.win-amd64-cpython-310\flash_attn
copying flash_attn\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn
creating build\lib.win-amd64-cpython-310\hopper
copying hopper\benchmark_attn.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\benchmark_flash_attention_fp8.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\benchmark_mla_decode.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\benchmark_split_kv.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\flash_attn_interface.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\generate_kernels.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\padding.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\setup.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\test_attn_kvcache.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\test_flash_attn.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\test_kvcache.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\test_util.py -> build\lib.win-amd64-cpython-310\hopper
copying hopper\__init__.py -> build\lib.win-amd64-cpython-310\hopper
creating build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\ampere_helpers.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\blackwell_helpers.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\block_info.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\fast_math.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\flash_bwd.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\flash_bwd_postprocess.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\flash_bwd_preprocess.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\flash_fwd.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\flash_fwd_sm100.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\hopper_helpers.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\interface.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\mask.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\mma_sm100_desc.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\named_barrier.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\pack_gqa.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\pipeline.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\seqlen_info.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\softmax.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\tile_scheduler.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\utils.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
copying flash_attn\cute\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\cute
creating build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\bench.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\bwd_prefill.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\bwd_prefill_fused.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\bwd_prefill_onekernel.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\bwd_prefill_split.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\bwd_ref.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\fp8.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\fwd_decode.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\fwd_prefill.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\fwd_ref.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\interface_fa.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\test.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\train.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\utils.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
copying flash_attn\flash_attn_triton_amd\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\flash_attn_triton_amd
creating build\lib.win-amd64-cpython-310\flash_attn\layers
copying flash_attn\layers\patch_embed.py -> build\lib.win-amd64-cpython-310\flash_attn\layers
copying flash_attn\layers\rotary.py -> build\lib.win-amd64-cpython-310\flash_attn\layers
copying flash_attn\layers\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\layers
creating build\lib.win-amd64-cpython-310\flash_attn\losses
copying flash_attn\losses\cross_entropy.py -> build\lib.win-amd64-cpython-310\flash_attn\losses
copying flash_attn\losses\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\losses
creating build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\baichuan.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\bert.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\bigcode.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\btlm.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\falcon.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\gpt.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\gptj.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\gpt_neox.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\llama.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\opt.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\vit.py -> build\lib.win-amd64-cpython-310\flash_attn\models
copying flash_attn\models\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\models
creating build\lib.win-amd64-cpython-310\flash_attn\modules
copying flash_attn\modules\block.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
copying flash_attn\modules\embedding.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
copying flash_attn\modules\mha.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
copying flash_attn\modules\mlp.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
copying flash_attn\modules\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\modules
creating build\lib.win-amd64-cpython-310\flash_attn\ops
copying flash_attn\ops\activations.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
copying flash_attn\ops\fused_dense.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
copying flash_attn\ops\layer_norm.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
copying flash_attn\ops\rms_norm.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
copying flash_attn\ops\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\ops
creating build\lib.win-amd64-cpython-310\flash_attn\utils
copying flash_attn\utils\benchmark.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
copying flash_attn\utils\distributed.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
copying flash_attn\utils\generation.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
copying flash_attn\utils\library.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
copying flash_attn\utils\pretrained.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
copying flash_attn\utils\testing.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
copying flash_attn\utils\torch.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
copying flash_attn\utils\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\utils
creating build\lib.win-amd64-cpython-310\flash_attn\ops\triton
copying flash_attn\ops\triton\cross_entropy.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
copying flash_attn\ops\triton\k_activations.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
copying flash_attn\ops\triton\layer_norm.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
copying flash_attn\ops\triton\linear.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
copying flash_attn\ops\triton\mlp.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
copying flash_attn\ops\triton\rotary.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
copying flash_attn\ops\triton\__init__.py -> build\lib.win-amd64-cpython-310\flash_attn\ops\triton
running build_ext
W1106 16:57:10.125498 19476 site-packages\torch\utils\cpp_extension.py:466] Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
building 'flash_attn_2_cuda' extension
creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310
creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release
creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc
creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc\flash_attn
creating C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc\flash_attn\src
W1106 16:57:14.979290 19476 site-packages\torch\utils\cpp_extension.py:466] Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
[1/73] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
flash_bwd_hdim128_bf16_causal_sm80.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_causal_sm80.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_causal_sm80.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_causal_sm80.cu
C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: namespace "cutlass::platform" has no member "is_unsigned_v"
static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
^
C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: type name is not allowed
static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
^
C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: expected an expression
static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
^
3 errors detected in the compilation of "C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/flash_attn/src/flash_bwd_hdim128_bf16_causal_sm80.cu".
[2/73] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
flash_bwd_hdim128_bf16_sm80.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_bf16_sm80.cu
C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: namespace "cutlass::platform" has no member "is_unsigned_v"
static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
^
C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: type name is not allowed
static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
^
C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: expected an expression
static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
^
3 errors detected in the compilation of "C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu".
[3/73] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_fp16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj.d -std=c++17 -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc --use-local-env -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src\flash_bwd_hdim128_fp16_causal_sm80.cu -o C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda
flash_bwd_hdim128_fp16_causal_sm80.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_causal_sm80.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_causal_sm80.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF2_OPERATORS__' with '/U__CUDA_NO_HALF2_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' with '/U__CUDA_NO_BFLOAT16_CONVERSIONS__'
flash_bwd_hdim128_fp16_causal_sm80.cu
C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: namespace "cutlass::platform" has no member "is_unsigned_v"
static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
^
C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: type name is not allowed
static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
^
C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/cutlass/include\cutlass/exmy_base.h(404): error: expected an expression
static_assert(cutlass::platform::is_unsigned_v<Storage>, "Use an unsigned integer for StorageType");
^
3 errors detected in the compilation of "C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/csrc/flash_attn/src/flash_bwd_hdim128_fp16_causal_sm80.cu".
[4/73] cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\flash_api.cpp /FoC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/flash_api.obj -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda /std:c++17
FAILED: C:/Users/Owner/AppData/Local/Temp/pip-install-2vd4f974/flash-attn_4973479891b74ebc8346c3e18b2b8ef8/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/flash_api.obj
cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\src -IC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\include\torch\csrc\api\include "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\include -IC:\Users\Owner\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -c C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\flash_attn\flash_api.cpp /FoC:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/flash_api.obj -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda /std:c++17
cl : Command line warning D9002 : ignoring unknown option '-O3'
cl : Command line warning D9002 : ignoring unknown option '-std=c++17'
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2039: 'is_unsigned_v': is not a member of 'cutlass::platform'
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/integer_subbyte.h(235): note: see declaration of 'cutlass::platform'
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): note: the template instantiation context (the oldest one first) is
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(936): note: while compiling class template 'cutlass::float_exmy_base'
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(950): note: see reference to function template instantiation 'auto cutlass::detail::fp_encoding_selector<cutlass::detail::FpEncoding::E8M23>(void)' being compiled
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(860): note: see reference to class template instantiation 'cutlass::detail::FpBitRepresentation<uint32_t,32,8,23,cutlass::detail::NanInfEncoding::IEEE_754,true>' being compiled
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2065: 'is_unsigned_v': undeclared identifier
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint32_t,32,8,23,cutlass::detail::NanInfEncoding::IEEE_754,true>::Storage': expected an expression instead of a type
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2059: syntax error: ','
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2238: unexpected token(s) preceding ';'
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,8,4,3,cutlass::detail::NanInfEncoding::CANONICAL_ONLY,false>::Storage': expected an expression instead of a type
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,8,8,0,cutlass::detail::NanInfEncoding::CANONICAL_ONLY,false>::Storage': expected an expression instead of a type
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,4,2,1,cutlass::detail::NanInfEncoding::NONE,true>::Storage': expected an expression instead of a type
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,6,2,3,cutlass::detail::NanInfEncoding::NONE,true>::Storage': expected an expression instead of a type
C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\csrc\cutlass\include\cutlass/exmy_base.h(404): error C2275: 'cutlass::detail::FpBitRepresentation<uint8_t,6,3,2,cutlass::detail::NanInfEncoding::NONE,true>::Storage': expected an expression instead of a type
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\setup.py", line 486, in run
urllib.request.urlretrieve(wheel_url, wheel_filename)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 525, in open
response = meth(req, response)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 634, in http_response
response = self.parent.error(
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 563, in error
return self._call_chain(*args)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 496, in _call_chain
result = func(*args)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2595, in _run_ninja_build
subprocess.run(
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '4']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\setup.py", line 526, in <module>
setup(
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\__init__.py", line 111, in setup
return distutils.core.setup(**attrs)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\core.py", line 184, in setup
return run_commands(dist)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\core.py", line 200, in run_commands
dist.run_commands()
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 964, in run_commands
self.run_command(cmd)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\dist.py", line 948, in run_command
super().run_command(command)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 983, in run_command
cmd_obj.run()
File "C:\Users\Owner\AppData\Local\Temp\pip-install-2vd4f974\flash-attn_4973479891b74ebc8346c3e18b2b8ef8\setup.py", line 503, in run
super().run()
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\command\bdist_wheel.py", line 384, in run
self.run_command("build")
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\dist.py", line 948, in run_command
super().run_command(command)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 983, in run_command
cmd_obj.run()
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build.py", line 135, in run
self.run_command(cmd_name)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\dist.py", line 948, in run_command
super().run_command(command)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\dist.py", line 983, in run_command
cmd_obj.run()
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\command\build_ext.py", line 96, in run
_build_ext.run(self)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\Cython\Distutils\old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 359, in run
self.build_extensions()
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1072, in build_extensions
build_ext.build_extensions(self)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\Cython\Distutils\old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 476, in build_extensions
self._build_extensions_serial()
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 502, in _build_extensions_serial
self.build_extension(ext)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\command\build_ext.py", line 257, in build_extension
_build_ext.build_extension(self, ext)
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 557, in build_extension
objects = self.compiler.compile(
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 1041, in win_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2227, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "C:\Users\Owner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\cpp_extension.py", line 2612, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
Metadata
Metadata
Assignees
Labels
No labels