Skip to content

[Usage]: [V1] Misleading Error Messages #13510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
robertgshaw2-redhat opened this issue Feb 19, 2025 · 7 comments · May be fixed by #17938
Open
1 task done

[Usage]: [V1] Misleading Error Messages #13510

robertgshaw2-redhat opened this issue Feb 19, 2025 · 7 comments · May be fixed by #17938
Labels
good first issue Good for newcomers help wanted Extra attention is needed usage How to use vllm

Comments

@robertgshaw2-redhat
Copy link
Collaborator

robertgshaw2-redhat commented Feb 19, 2025

Looking for help to improve error messages during startup!

Running a model that does not exist (e.g. MODEL=neuralmagic/Meta-Llama-3-8B-Instruct-FP8-dynamic << this does not exist), gives the following stack trace:

(venv-nm-vllm-abi3) rshaw@beaker:~$ VLLM_USE_V1=1 vllm serve $MODEL --disable-log-requests --no-enable-prefix-caching
INFO 02-19 03:45:16 __init__.py:190] Automatically detected platform cuda.
INFO 02-19 03:45:18 api_server.py:840] vLLM API server version 0.7.2.0
INFO 02-19 03:45:18 api_server.py:841] args: Namespace(subparser='serve', model_tag='neuralmagic/Meta-Llama-3-8B-Instruct-FP8-dynamic', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='neuralmagic/Meta-Llama-3-8B-Instruct-FP8-dynamic', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, disable_log_requests=True, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, dispatch_function=<function serve at 0x74d76eb99990>)
WARNING 02-19 03:45:18 arg_utils.py:1326] Setting max_num_batched_tokens to 8192 for OPENAI_API_SERVER usage context.
Traceback (most recent call last):
  File "/home/rshaw/venv-nm-vllm-abi3/bin/vllm", line 8, in <module>
    sys.exit(main())
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/scripts.py", line 204, in main
    args.dispatch_function(args)
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/scripts.py", line 44, in serve
    uvloop.run(run_server(args))
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
    return loop.run_until_complete(wrapper())
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/home/rshaw/.pyenv/versions/3.10.14/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/home/rshaw/.pyenv/versions/3.10.14/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 160, in build_async_engine_client_from_engine_args
    engine_client = AsyncLLMEngine.from_engine_args(
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 104, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1075, in create_engine_config
    model_config = self.create_model_config()
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 998, in create_model_config
    return ModelConfig(
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/config.py", line 302, in __init__
    hf_config = get_config(self.model, trust_remote_code, revision,
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/transformers_utils/config.py", line 201, in get_config
    raise ValueError(f"No supported config format found in {model}")
ValueError: No supported config format found in neuralmagic/Meta-Llama-3-8B-Instruct-FP8-dynamic

This is confusing

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@robertgshaw2-redhat robertgshaw2-redhat added the usage How to use vllm label Feb 19, 2025
@robertgshaw2-redhat robertgshaw2-redhat changed the title [Usage]: Misleading Error Message for V1 [Usage]: [V1[ Misleading Error Messages Feb 19, 2025
@robertgshaw2-redhat robertgshaw2-redhat changed the title [Usage]: [V1[ Misleading Error Messages [Usage]: [V1] Misleading Error Messages Feb 19, 2025
@robertgshaw2-redhat robertgshaw2-redhat added good first issue Good for newcomers help wanted Extra attention is needed labels Feb 19, 2025
@simo-hsieh
Copy link

@robertgshaw2-redhat I'd like to work on this.

@simo-hsieh
Copy link

@robertgshaw2-redhat I'd like to work on this.

I couldn't reproduce the same error using the same command.
Instead, I received customized error messages from Hugging Face.
I'll leave this to others who can reproduce the issue.

@davidxia
Copy link
Contributor

same, I got

$ VLLM_USE_V1=1 vllm serve neuralmagic/Meta-Llama-3-8B-Instruct-FP8-dynamic --disable-log-requests --no-enable-prefix-caching
...

ValueError: Invalid repository ID or local directory specified: 'neuralmagic/Meta-Llama-3-8B-Instruct-FP8-dynamic'.
Please verify the following requirements:
1. Provide a valid Hugging Face repository ID.
2. Specify a local directory that contains a recognized configuration file.
   - For Hugging Face models: ensure the presence of a 'config.json'.
   - For Mistral models: ensure the presence of a 'params.json'.

@davidxia
Copy link
Contributor

I think this issue is fixed by #13724? If so, this can be closed.

@ctdavi
Copy link

ctdavi commented May 8, 2025

Agree. This is fixed, or at least entirely different now, by #13724

I get the same #13724-looking messaging as davidxia got

@mengbingrock
Copy link

mengbingrock commented May 10, 2025

Hi @robertgshaw2-redhat , I likes your feedback even though your issue is not reproducible for some system setup, but I encounter similiar issues, and I've make further modification from #13724 , to take care of other issue like internet connection issue.

working on a PR here #17938

@mengbingrock mengbingrock linked a pull request May 10, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed usage How to use vllm
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants