LLM plugin for accessing Hugging Face Inference Providers - giving you access to 100+ open-weight models through a unified API.
This is a personal project that is still in development. Contributions and feedback are welcome, but please note that support may be limited.
Make sure LLM is installed on your machine.
Then clone this repository:
git clone https://github.com/sebington/llm-hf.gitcd llm-hfllm install -e .You need a Hugging Face access token with "Make calls to Inference Providers" permissions.
First, create a token at https://huggingface.co/settings/tokens/new?tokenType=fineGrained.
Then configure it using one of these methods:
Option 1: Using llm keys (recommended)
llm keys set hf<paste token here>
Option 2: Using environment variable
export HF_TOKEN="your-token-here"The plugin provides an llm hf command group for managing Hugging Face models:
# List all available Hugging Face models
llm hf models
# Refresh the model list from the API and see what changed
llm hf refreshThe llm hf refresh command is particularly useful to:
- Check if new models have been added to Hugging Face Inference Providers
- See which models have been removed from the service
- Verify your token is working correctly
Alternative way to list models:
llm models | grep HuggingFaceChatBoth methods show ~116 models dynamically fetched from the Hugging Face API.
Simply use the model name directly:
llm -m meta-llama/Llama-3.1-8B-Instruct "Write a poem about translation"With options:
llm -m Qwen/Qwen2.5-Coder-32B-Instruct \
-o temperature 0.7 \
-o max_tokens 500 \
"Write a Python function to sort a list"With a specific provider:
llm -m meta-llama/Llama-3.1-8B-Instruct \
-o provider sambanova \
"What is the capital of France?"In chat mode:
llm chat -m meta-llama/Llama-3.1-8B-InstructWith system prompt:
llm -m Qwen/Qwen2.5-Coder-32B-Instruct \
-s "You are a helpful coding assistant" \
"How do I sort a list in Python?"provider(optional): Specify a provider (e.g.,sambanova,together,fireworks-ai,groq)- If not specified, Hugging Face automatically selects the best available provider
- Note: Not all providers support all models
temperature: Sampling temperature between 0.0 and 2.0 (default: provider default)max_tokens: Maximum number of tokens to generate (default: provider default)top_p: Nucleus sampling parameter between 0.0 and 1.0 (default: provider default)
When using the provider option, you can choose from:
sambanovatogetherfireworks-aigroqcerebrashyperbolicfeatherless-ainebiusnovita- And more!
Note: Each provider supports different models. If you request a model from a provider that doesn't support it, you'll get an error message.
All models available through Hugging Face Inference Providers are automatically discoverable using the commands above.
You can also browse available models at:
The plugin uses the same model list as the Hugging Face API, so any model shown in the playground should work with this plugin. Run llm hf refresh periodically to update your local model list.
All prompts and responses are automatically logged. View logs with:
llm logsView the most recent entry:
llm logs -n 1Apache 2.0