contrib: support modelscope community #12664

tastelikefeet · 2025-03-31T02:37:00Z

Support gguf models of modelscope community.
linux/mac/windows HF/MS download tested.

ngxson

I think we need to re-think a bit about the implementation

From UX perspective, adding 4 more args may not be intuitive for most users, especially for most people outside china who won't use this (~~I understand the HF is not accessible from china~~ checked with HF team - HF is still accessible in china, but still we should allow switching model host in a way that is less confused)

What I'm thinking is to handle the case where user can add protocol prefix to the existing -m, like what we have in llama-run. So for example

hugging face: -m hf://user/model:quant (equivalent to -hf user/model:quant)
modelscope: -m ms://user/model:quant

common/arg.cpp

ngxson · 2025-03-31T10:05:20Z

What I'm thinking is to handle the case where user can add protocol prefix to the existing -m, like what we have in llama-run. So for example
* hugging face: `-m hf://user/model:quant` (equivalent to `-hf user/model:quant`)

* modelscope: `-m ms://user/model:quant`

Btw, we also have hf-mirror.com, so I think it's safe to say that adding one set of 4 arguments per host is not a scalable solution. So I would prefer to go with the protocol://... format

foldl · 2025-03-31T11:09:58Z

The abbreviation MS looks weird to me.

tastelikefeet · 2025-03-31T11:22:09Z

What I'm thinking is to handle the case where user can add protocol prefix to the existing -m, like what we have in llama-run. So for example
* hugging face: `-m hf://user/model:quant` (equivalent to `-hf user/model:quant`)

* modelscope: `-m ms://user/model:quant`
Btw, we also have hf-mirror.com, so I think it's safe to say that adding one set of 4 arguments per host is not a scalable solution. So I would prefer to go with the protocol://... format

This change was previously considered too significant for contributors(to add a new argument protocol for llama-cli and llama-server), so I chose a more conservative approach. Do we need to change to this way?

ngxson · 2025-03-31T12:02:17Z

This change was previously considered too significant for contributors(to add a new argument protocol for llama-cli and llama-server), so I chose a more conservative approach. Do we need to change to this way?

I still think having the protocol:// will be more extensible because:

It's easier to add a third provider in the future
We can also make it a positional arg (that's an idea for the future), so we can eliminate using -m and simply doing llama-cli protocol://user/model

Also, please correct if I'm wrong, but aren't modelscope and HF urls using the same structure? The format is: {base_domain}/resolve/main/{file}, then in this case I think a lot of codes in this PR is redundant, as we can simply have the base_domain replaceable.

ngxson

I would prefer a more simple approach. At the moment, this feature is kinda "nice-to-have", so we should start with something simple, then improve it later when more users use this.

common/common.cpp

tastelikefeet · 2025-03-31T12:12:08Z

This change was previously considered too significant for contributors(to add a new argument protocol for llama-cli and llama-server), so I chose a more conservative approach. Do we need to change to this way?

I still think having the protocol:// will be more extensible because:

It's easier to add a third provider in the future

We can also make it a positional arg (that's an idea for the future), so we can eliminate using -m and simply doing llama-cli protocol://user/model

Also, please correct if I'm wrong, but aren't modelscope and HF urls using the same structure? The format is: {base_domain}/resolve/main/{file}, then in this case I think a lot of codes in this PR is redundant, as we can simply have the base_domain replaceable.

No, the url may be a little different. HF is resolve/main, MS is resolve/master

yingdachen · 2025-03-31T14:19:04Z

I think a lot of discussions in this thread are quite constructive, which may result in a cleaner implementation that can facilitate llama.cpp users to access their needed model files from ModelScope.

At the same time, there are obvious values to having another alternative community. For one thing, unlike lightweight open-source codes, large model weights require much more than a "mirror" site for efficient and high-speed access. ModelScope is dedicated to making models more accessible to our over 15-million users. Obviously integration into popular tools/frameworks like llama.cpp can greatly facilitate the efforts. We also believe easier access to wider model selection can benefit users of llama.cpp, and the open source communities as a whole. After all, the easy access to gguf models on huggingface via mode-id must have been implemented out of the same belief that such integration can connect llama.cpp better with the model-ecosystems. Otherwise, standalone operations like "wget" can get gguf models as well :)

We will take the discussions today back to see how we can coordinate server-side protocol, to better accommodate client-side implementations that have been built around the hf ecosystem. @tastelikefeet

* master: (123 commits) cuda : add f32 to bf16 copy op (ggml-org#12806) llava: improve clip_ctx destructor to not memleak load_image_size (ggml-org#12834) llama : fix FA when KV cache is not used (i.e. embeddings) (ggml-org#12825) server : fix thread.join() on exit (ggml-org#12831) llava: add more helper functions to check projector types in clip context (ggml-org#12824) arg : Including limits file on AIX (ggml-org#12822) server : webui : Improve Chat Input with Auto-Sizing Textarea (ggml-org#12785) Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (ggml-org#12812) gguf-py : support lazy tensor splitting (ggml-org#12809) llama : Support llama 4 text-only (ggml-org#12791) opencl: better identify Adreno GPU (ggml-org#12760) hellaswag: display estimated score confidence interval (ggml-org#12797) cuda : fix HIP and MUSA BF16 (#0) sync : ggml ggml : simplify Arm fp16 CPU logic (ggml/1177) CUDA: don't convert BF16 weights to FP32 (ggml/1174) cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (ggml-org#12734) ci : no curl on ggml-ci (ggml-org#12796) cmake : enable curl by default (ggml-org#12761) ... # Conflicts: # common/arg.cpp # common/common.cpp # common/common.h

tastelikefeet · 2025-04-10T08:53:32Z

@ngxson @ggerganov We fixed the conversations and updated our backend code, and fixed some code that hard-coded Hugging Face endpoint to support ModelScope and other communities with the same downloading protocol, meanwhile we changed the HF_ENDPOINT to MODEL_ENDPOINT to make note developers with multiple communities support. cc @yingdachen

common/common.cpp

ngxson · 2025-04-11T07:28:39Z

common/common.h

@@ -543,6 +543,8 @@ struct ggml_threadpool_params ggml_threadpool_params_from_cpu_params(const cpu_p
 // clear LoRA adapters from context, then apply new list of adapters
 void common_set_adapter_lora(struct llama_context * ctx, std::vector<common_adapter_lora_info> & lora);

+std::string                   get_model_endpoint();


Maybe better to move this function to arg.cpp, but I'll do it later

ngxson · 2025-04-11T07:29:18Z

Thanks for the contrib, I'll merge this once the CI is green

* support download from modelscope * support login * remove comments * add arguments * fix code * fix win32 * test passed * fix readme * revert readme * change to MODEL_ENDPOINT * revert tail line * fix readme * refactor model endpoint * remove blank line * fix header * fix as comments * update comment * update readme --------- Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>

tastelikefeet and others added 6 commits March 27, 2025 18:08

support download from modelscope

47672e5

support login

eef1c6e

remove comments

95a5274

add arguments

a996e2f

fix code

d756ebf

fix win32

c9aca3e

github-actions bot added the examples label Mar 31, 2025

tastelikefeet closed this Mar 31, 2025

tastelikefeet reopened this Mar 31, 2025

ngxson reviewed Mar 31, 2025

View reviewed changes

common/arg.cpp Outdated Show resolved Hide resolved

common/arg.cpp Outdated Show resolved Hide resolved

ngxson reviewed Mar 31, 2025

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

common/common.cpp Outdated Show resolved Hide resolved

ngxson mentioned this pull request Apr 1, 2025

common : refactor downloading system, handle mmproj with -hf option #12694

Merged

tastelikefeet added 10 commits April 9, 2025 10:56

test passed

8925da2

fix readme

52f93f2

revert readme

76f48ed

change to MODEL_ENDPOINT

2765347

revert tail line

f3d0220

fix readme

4c2abb3

refactor model endpoint

da6dd67

remove blank line

539d3ed

fix header

e102d68

ngxson reviewed Apr 10, 2025

View reviewed changes

common/common.cpp Show resolved Hide resolved

tastelikefeet added 3 commits April 11, 2025 10:41

fix as comments

e20a42e

update comment

cc3d234

update readme

428c65a

ngxson reviewed Apr 11, 2025

View reviewed changes

ngxson approved these changes Apr 11, 2025

View reviewed changes

ngxson merged commit b2034c2 into ggml-org:master Apr 11, 2025
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

contrib: support modelscope community #12664

contrib: support modelscope community #12664

Uh oh!

tastelikefeet commented Mar 31, 2025 •

edited

Loading

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ngxson commented Mar 31, 2025

Uh oh!

foldl commented Mar 31, 2025

Uh oh!

tastelikefeet commented Mar 31, 2025 •

edited

Loading

Uh oh!

ngxson commented Mar 31, 2025

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

tastelikefeet commented Mar 31, 2025

Uh oh!

yingdachen commented Mar 31, 2025

Uh oh!

tastelikefeet commented Apr 10, 2025

Uh oh!

Uh oh!

ngxson Apr 11, 2025

Uh oh!

ngxson commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

contrib: support modelscope community #12664

contrib: support modelscope community #12664

Uh oh!

Conversation

tastelikefeet commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ngxson commented Mar 31, 2025

Uh oh!

foldl commented Mar 31, 2025

Uh oh!

tastelikefeet commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Mar 31, 2025

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tastelikefeet commented Mar 31, 2025

Uh oh!

yingdachen commented Mar 31, 2025

Uh oh!

tastelikefeet commented Apr 10, 2025

Uh oh!

Uh oh!

ngxson Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

tastelikefeet commented Mar 31, 2025 •

edited

Loading

ngxson left a comment •

edited

Loading

tastelikefeet commented Mar 31, 2025 •

edited

Loading

ngxson left a comment •

edited

Loading