Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computer get stucked after several correct runs #133

Open
joe-zxh opened this issue Mar 2, 2025 · 9 comments
Open

Computer get stucked after several correct runs #133

joe-zxh opened this issue Mar 2, 2025 · 9 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed urgent

Comments

@joe-zxh
Copy link

joe-zxh commented Mar 2, 2025

Dear author, you have made a great contribution to the research of diffusion models. I have great interest on your project.

However, my computer stucked at writing files after several correct runs of nunchaku:

Image

(not only I could not save the image in the code, but also I cannot save python files or text files)
{even fix the seed, will also reproduce the problem}

Only reboot the computer could recover, but will encounter same problem after serveral runs...

The code to reproduce the problem:

# svdq_debug.py
import torch
import numpy as np
from diffusers import FluxPipeline
from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel
from PIL import Image

# flux_dev_path = "black-forest-labs/FLUX.1-dev"
# svdq_flux_path = "mit-han-lab/svdq-int4-flux.1-dev"
flux_dev_path = "/data2/dev/zxh_ws/sd/ComfyUI_zxh/ComfyUI/models/diffusers/FLUX.1-dev"
svdq_flux_path = "/data2/dev/zxh_ws/sd/ComfyUI_zxh/ComfyUI/models/diffusion_models/svdq-int4-flux.1-dev"

print(f"\n====before save1====")
Image.fromarray(np.zeros((256, 256, 3), dtype=np.uint8)).save("ttt_zero.png")
print(f"====after save1====\n")

transformer : NunchakuFluxTransformer2dModel = NunchakuFluxTransformer2dModel.from_pretrained(svdq_flux_path)
pipeline = FluxPipeline.from_pretrained(
    flux_dev_path, torch_dtype=torch.bfloat16, transformer=transformer
).to("cuda")

image = pipeline(
    "A cat holding a sign that says hello world",
    num_inference_steps=25,
    guidance_scale=3.5,
).images[0]
print(f"\n====before save2====")
image.save("ttt9.png")
print(f"====after save2====\n")

My cuda version is 12.4, torch version is 2.6.0, python version is 3.11.0, os system is ubuntu 20.04
I install nunchaku using your provided wheel:
pip install nunchaku-0.1.3+torch2.6-cp311-cp311-linux_x86_64.whl
(I also try build from source, but get simialr result)

My computer environment:
Image

My pip environment:

(zxh_svdq2_cu124) ubuntu@yons-MS-7E06 /data2/dev/zxh_ws/sd/ComfyUI_zxh> pip list                                                     master!?
Package                  Version        Editable project location
------------------------ -------------- ---------------------------------------------
absl-py                  2.1.0
accelerate               1.4.0
aiofiles                 23.2.1
aiohappyeyeballs         2.4.6
aiohttp                  3.11.13
aiosignal                1.3.2
annotated-types          0.7.0
anyio                    4.8.0
attrs                    25.1.0
backports.tarfile        1.2.0
beautifulsoup4           4.13.3
bitsandbytes             0.45.3
bs4                      0.0.2
build                    1.2.2.post1
CacheControl             0.14.2
cd-fvd                   0.1.1
certifi                  2025.1.31
cffi                     1.17.1
chardet                  5.2.0
charset-normalizer       3.4.1
clean-fid                0.1.35
cleo                     2.1.0
click                    8.1.8
clip                     1.0
colorama                 0.4.6
crashtest                0.4.1
cryptography             44.0.2
DataProperty             1.1.0
datasets                 3.3.2
deepcompressor           0.0.1          /data/dev/zxh-workspace/github/deepcompressor
diffusers                0.32.2
dill                     0.3.8
distlib                  0.3.9
docstring_parser         0.16
dominate                 2.9.1
dulwich                  0.22.7
einops                   0.8.1
evaluate                 0.4.3
fairscale                0.4.13
fastapi                  0.115.11
fastjsonschema           2.21.1
ffmpy                    0.5.0
filelock                 3.17.0
findpython               0.6.2
frozenlist               1.5.0
fsspec                   2024.12.0
ftfy                     6.3.1
fuzzywuzzy               0.18.0
GPUtil                   1.4.0
gradio                   5.20.0
gradio_client            1.7.2
groovy                   0.1.2
h11                      0.14.0
httpcore                 1.0.7
httpx                    0.28.1
huggingface-hub          0.29.1
idna                     3.10
image-reward             1.5
importlib_metadata       8.6.1
installer                0.7.0
jaraco.classes           3.4.0
jaraco.context           6.0.1
jaraco.functools         4.1.0
jeepney                  0.9.0
jieba                    0.42.1
Jinja2                   3.1.5
joblib                   1.4.2
jsonlines                4.0.0
keyring                  25.6.0
Levenshtein              0.26.1
lightning-utilities      0.12.0
lm_eval                  0.4.7
lxml                     5.3.1
markdown-it-py           3.0.0
MarkupSafe               2.1.5
mbstrdecoder             1.1.4
mdurl                    0.1.2
more-itertools           10.6.0
mpmath                   1.3.0
msgpack                  1.1.0
multidict                6.1.0
multiprocess             0.70.16
networkx                 3.4.2
ninja                    1.11.1.3
nltk                     3.9.1
numexpr                  2.10.2
numpy                    2.2.3
nunchaku                 0.1.3+torch2.6
nvidia-cublas-cu12       12.4.5.8
nvidia-cuda-cupti-cu12   12.4.127
nvidia-cuda-nvrtc-cu12   12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.2.1.3
nvidia-curand-cu12       10.3.5.147
nvidia-cusolver-cu12     11.6.1.9
nvidia-cusparse-cu12     12.3.1.170
nvidia-cusparselt-cu12   0.6.2
nvidia-nccl-cu12         2.21.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.4.127
omniconfig               0.1.10
opencv-python            4.11.0.86
orjson                   3.10.15
packaging                24.2
pandas                   2.2.3
pathvalidate             3.2.3
pbs-installer            2025.2.12
peft                     0.14.0
pillow                   11.1.0
pip                      25.0
pkginfo                  1.12.1.2
platformdirs             4.3.6
poetry                   2.1.1
poetry-core              2.1.1
portalocker              3.1.1
propcache                0.3.0
protobuf                 5.29.3
psutil                   5.9.8
pyarrow                  19.0.1
pyav                     14.2.1
pybind11                 2.13.6
pycparser                2.22
pydantic                 2.10.6
pydantic_core            2.27.2
pydub                    0.25.1
Pygments                 2.19.1
pyproject_hooks          1.2.0
pytablewriter            1.2.1
python-dateutil          2.9.0.post0
python-Levenshtein       0.26.1
python-multipart         0.0.20
pytz                     2025.1
PyYAML                   6.0.2
RapidFuzz                3.12.1
regex                    2024.11.6
requests                 2.32.3
requests-toolbelt        1.0.0
rich                     13.9.4
rotary-embedding-torch   0.8.6
rouge                    1.0.1
rouge_score              0.1.2
ruff                     0.9.9
sacrebleu                2.5.1
safehttpx                0.1.6
safetensors              0.5.3
scikit-learn             1.6.1
scipy                    1.15.2
SecretStorage            3.3.3
semantic-version         2.10.0
sentencepiece            0.2.0
setuptools               75.8.2
shellingham              1.5.4
six                      1.17.0
sniffio                  1.3.1
soupsieve                2.6
spaces                   0.32.0
sqlitedict               2.1.0
starlette                0.46.0
sympy                    1.13.1
tabledata                1.3.4
tabulate                 0.9.0
tcolorpy                 0.1.7
threadpoolctl            3.5.0
timm                     1.0.15
tokenizers               0.21.0
toml                     0.10.2
tomlkit                  0.13.2
torch                    2.6.0
torchaudio               2.6.0+cu124
torchmetrics             1.6.1
torchvision              0.21.0
tqdm                     4.67.1
tqdm-multiprocess        0.0.11
transformers             4.49.0
triton                   3.2.0
trove-classifiers        2025.2.18.16
typepy                   1.3.4
typer                    0.15.2
typing_extensions        4.12.2
tzdata                   2025.1
urllib3                  2.3.0
uvicorn                  0.34.0
virtualenv               20.29.2
wcwidth                  0.2.13
websockets               15.0
wheel                    0.45.1
word2number              1.1
xformers                 0.0.29.post3
xxhash                   3.5.0
yarl                     1.18.3
zipp                     3.21.0
zstandard                0.23.0

I am looking forward to hearing from you, thanks.

@D0522J
Copy link

D0522J commented Mar 2, 2025

out of memory,you may check your memory cache usage, when it get stucked

@joe-zxh
Copy link
Author

joe-zxh commented Mar 2, 2025

out of memory,you may check your memory cache usage, when it get stucked

@D0522J Thanks for replying.

Not memory issue I guess.

I change the code to the following to reduce vram.

import torch
import numpy as np
from diffusers import FluxPipeline
from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel
from nunchaku.models.text_encoder import NunchakuT5EncoderModel
from PIL import Image

# flux_dev_path = "black-forest-labs/FLUX.1-dev"
# svdq_flux_path = "mit-han-lab/svdq-int4-flux.1-dev"
# te_path = "mit-han-lab/svdq-flux.1-t5"
flux_dev_path = "/data2/dev/zxh_ws/sd/ComfyUI_zxh/ComfyUI/models/diffusers/FLUX.1-dev"
svdq_flux_path = "/data2/dev/zxh_ws/sd/ComfyUI_zxh/ComfyUI/models/diffusion_models/svdq-int4-flux.1-dev"
te_path = "/data2/dev/zxh_ws/sd/ComfyUI_zxh/ComfyUI/models/text_encoders/svdq-flux.1-t5"

print(f"\n====before save1====")
Image.fromarray(np.zeros((256, 256, 3), dtype=np.uint8)).save("ttt_zero.png")
print(f"====after save1====\n")

text_encoder_2 = NunchakuT5EncoderModel.from_pretrained(te_path)

transformer : NunchakuFluxTransformer2dModel = NunchakuFluxTransformer2dModel.from_pretrained(svdq_flux_path)
pipeline = FluxPipeline.from_pretrained(
    flux_dev_path,
    torch_dtype=torch.bfloat16,
    transformer=transformer,
    text_encoder_2=text_encoder_2,
    safety_checker=None,
).to("cuda")
pipeline.enable_sequential_cpu_offload()

image = pipeline(
    "A cat holding a sign that says hello world",
    num_inference_steps=25,
    guidance_scale=3.5,
).images[0]
print(f"\n====before save2====")
image.save("ttt9.png")
print(f"====after save2====\n")

The stuck proble still happen. I check the vram and ram using the command:
watch -n 0.5 "free -h && echo && nvidia-smi && echo && echo && top -bn 1 -i -c | grep -E 'python|MEM'"
It shows:

Image

Also, I record ram using tensorboard every 0.01 second, the maximum ram I use is about 24G, but I have 94GB on my computer

When I check process by top command, it shows:
Image

several kworker/u64xxxx-flush process are occupying the cpu, I don't know what they are...
they just appear for several seconds and the process id is change, while the COMMAND is still kworker xxx

@D0522J Any ideas?

@joe-zxh
Copy link
Author

joe-zxh commented Mar 2, 2025

Inference is succeed, but failed when write files...

I kill the process and rerun, same stucking at write files

@D0522J
Copy link

D0522J commented Mar 3, 2025

Try not to connect to your server through ssh. Try running Python Example directly on the server. I don't know why, but it works.

@joe-zxh
Copy link
Author

joe-zxh commented Mar 3, 2025

Try not to connect to your server through ssh. Try running Python Example directly on the server. I don't know why, but it works.

@D0522J Thanks for replying. I try runing the code directly on the server, but not work.

@pivtienduc
Copy link

pivtienduc commented Mar 4, 2025

Hi @joe-zxh

How did you install deepcompressor into your pip list. Please guide me

Mine get error like this:

Image

Thank you so much

@joe-zxh
Copy link
Author

joe-zxh commented Mar 4, 2025

Hi @joe-zxh

How did you install deepcompressor into your pip list. Please guide me

Mine get error like this:

Image

Thank you so much

@ pivtienduc

  1. install all the dependencies in the pyproject.toml manually using pip install.
  2. comment out the dependencies in the pyproject.toml
  3. poetry install

stupid but work 😂

@pivtienduc
Copy link

Hi @joe-zxh
How did you install deepcompressor into your pip list. Please guide me
Mine get error like this:
Image
Thank you so much

@ pivtienduc

  1. install all the dependencies in the pyproject.toml manually using pip install.
  2. comment out the dependencies in the pyproject.toml
  3. poetry install

stupid but work 😂

Install this nunchaku is a pain, everytime it's updated I get trouble with install the latest version again

@lmxyy lmxyy self-assigned this Mar 5, 2025
@lmxyy lmxyy added bug Something isn't working urgent help wanted Extra attention is needed labels Mar 5, 2025
@joe-zxh
Copy link
Author

joe-zxh commented Mar 9, 2025

@lmxyy seems v0.1.4 fix this problem. Thanks a lot for your great contribution.

Is it related to this modification?
Image

Why is replacing gemm w4a4 with gemm awq solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed urgent
Projects
None yet
Development

No branches or pull requests

4 participants