Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #162

chensyo · 2025-03-08T13:30:02Z

新版本0.14更新后报这个错误，之前0.13是可以正常使用的，python3.11，torch2.60+cuda12.6
!!! Exception during processing !!! CUDA error: operation not supported
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Traceback (most recent call last):
File "D:\Program Files\ComfyUI\execution.py", line 327, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Program Files\ComfyUI\execution.py", line 202, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Program Files\ComfyUI\execution.py", line 174, in _map_node_over_list
process_inputs(input_dict, i)
File "D:\Program Files\ComfyUI\execution.py", line 163, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Program Files\ComfyUI\custom_nodes\svdquant\nodes\models\flux.py", line 134, in load_model
transformer = transformer.to(device)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\diffusers\models\modeling_utils.py", line 1077, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1343, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 903, in _apply
module._apply(fn)
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 903, in _apply
module._apply(fn)
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 903, in _apply
module._apply(fn)
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 930, in _apply
param_applied = fn(param)
^^^^^^^^^
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1329, in convert
return t.to(
^^^^^
RuntimeError: CUDA error: operation not supported
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Prompt executed in 5.77 seconds
FETCH ComfyRegistry Data: 10/36
FETCH ComfyRegistry Data: 15/36
got prompt
[2025-03-08 21:26:51.585] [info] Initializing QuantizedFluxModel
[2025-03-08 21:26:52.465] [info] Loading weights from D:\Program Files\ComfyUI\models\diffusion_models\svdq-int4-flux.1-schnell\transformer_blocks.safetensors
[2025-03-08 21:26:52.466] [warning] Failed to load safetensors using method MIO: CUDA error: operation not supported (at D:\Program Files\BuildWhl\nunchaku\src\Serialization.cpp:130)
[2025-03-08 21:26:57.272] [info] Done.
!!! Exception during processing !!! CUDA error: operation not supported
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Traceback (most recent call last):
File "D:\Program Files\ComfyUI\execution.py", line 327, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Program Files\ComfyUI\execution.py", line 202, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Program Files\ComfyUI\execution.py", line 174, in _map_node_over_list
process_inputs(input_dict, i)
File "D:\Program Files\ComfyUI\execution.py", line 163, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Program Files\ComfyUI\custom_nodes\svdquant\nodes\models\flux.py", line 134, in load_model
transformer = transformer.to(device)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\diffusers\models\modeling_utils.py", line 1077, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1343, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 903, in _apply
module._apply(fn)
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 903, in _apply
module._apply(fn)
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 903, in _apply
module._apply(fn)
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 930, in _apply
param_applied = fn(param)
^^^^^^^^^
File "C:\Users\Mingzhenwang\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1329, in convert
return t.to(
^^^^^
RuntimeError: CUDA error: operation not supported
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

The text was updated successfully, but these errors were encountered:

Fix #162

sxtyzhangzk · 2025-03-08T14:17:10Z

Could you check if the latest commit fixes this issue? Thanks.

chensyo · 2025-03-08T15:18:11Z

git pull 最新的main，看到应该是更新了common.h，重新编译安装了一下，仍然报错
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

hdfhssg · 2025-03-08T15:43:06Z

git pull 最新的main，看到应该是更新了common.h，重新编译安装了一下，仍然报错 RuntimeError: CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA

你是Windows10吧，我Windows10也有这个报错，我修改文件已经解决它了，我再测试测试没问题我把我改的放上来吧，或者你等他们官方看看有没有更好的修改方式

sxtyzhangzk · 2025-03-08T15:56:18Z

git pull 最新的main，看到应该是更新了common.h，重新编译安装了一下，仍然报错 RuntimeError: CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Could you try setting the environment variable NUNCHAKU_LOAD_METHOD?
Try
set NUNCHAKU_LOAD_METHOD=READ
or
set NUNCHAKU_LOAD_METHOD=READNOPIN

chensyo · 2025-03-08T16:01:44Z

git pull 最新的main，看到应该是更新了common.h，重新编译安装了一下，仍然报错 RuntimeError: CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA

你是Windows10吧，我Windows10也有这个报错，我修改文件已经解决它了，我再测试测试没问题我把我改的放上来吧，或者你等他们官方看看有没有更好的修改方式

是的，我是win10.之前0.1.3版本报那个Unable to pin memory: operation not supported错误然后闪退，后面拿你改的Serialization.cpp编译完虽然也报错但是可以正常使用，现在更新了又不行了

hdfhssg · 2025-03-08T16:02:35Z

git pull 最新的main，看到应该是更新了common.h，重新编译安装了一下，仍然报错 RuntimeError: CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA

Could you try setting the environment variable ? Try orNUNCHAKU_LOAD_METHOD``set NUNCHAKU_LOAD_METHOD=READ set NUNCHAKU_LOAD_METHOD=READNOPIN

OK, I'll try later.

chensyo · 2025-03-08T16:02:57Z

git pull 最新的main，看到应该是更新了common.h，重新编译安装了一下，仍然报错 RuntimeError: CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Could you try setting the environment variable NUNCHAKU_LOAD_METHOD? Try set NUNCHAKU_LOAD_METHOD=READ or set NUNCHAKU_LOAD_METHOD=READNOPIN

这个可以，恢复正常了，感谢

sxtyzhangzk closed this as completed Mar 8, 2025

sxtyzhangzk reopened this Mar 8, 2025

sxtyzhangzk closed this as completed in df37484 Mar 8, 2025

sxtyzhangzk added a commit that referenced this issue Mar 8, 2025

Merge pull request #164 from mit-han-lab/dev

0abe5b8

Fix #162

sxtyzhangzk reopened this Mar 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #162

Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #162

chensyo commented Mar 8, 2025

sxtyzhangzk commented Mar 8, 2025

chensyo commented Mar 8, 2025

hdfhssg commented Mar 8, 2025

sxtyzhangzk commented Mar 8, 2025

chensyo commented Mar 8, 2025

hdfhssg commented Mar 8, 2025

chensyo commented Mar 8, 2025

Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. #162

Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. #162

Comments

chensyo commented Mar 8, 2025

sxtyzhangzk commented Mar 8, 2025

chensyo commented Mar 8, 2025

hdfhssg commented Mar 8, 2025

sxtyzhangzk commented Mar 8, 2025

chensyo commented Mar 8, 2025

hdfhssg commented Mar 8, 2025

chensyo commented Mar 8, 2025

Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #162

Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #162