Skip to content

sebington/llm-hf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-hf

LLM plugin for accessing Hugging Face Inference Providers - giving you access to 100+ open-weight models through a unified API.

Project Status

This is a personal project that is still in development. Contributions and feedback are welcome, but please note that support may be limited.

Installation

Make sure LLM is installed on your machine.

Then clone this repository:

git clone https://github.com/sebington/llm-hf.git
cd llm-hf
llm install -e .

Configuration

You need a Hugging Face access token with "Make calls to Inference Providers" permissions.

First, create a token at https://huggingface.co/settings/tokens/new?tokenType=fineGrained.

Then configure it using one of these methods:

Option 1: Using llm keys (recommended)

llm keys set hf
<paste token here>

Option 2: Using environment variable

export HF_TOKEN="your-token-here"

Usage

Plugin Commands

The plugin provides an llm hf command group for managing Hugging Face models:

# List all available Hugging Face models
llm hf models

# Refresh the model list from the API and see what changed
llm hf refresh

The llm hf refresh command is particularly useful to:

  • Check if new models have been added to Hugging Face Inference Providers
  • See which models have been removed from the service
  • Verify your token is working correctly

Alternative way to list models:

llm models | grep HuggingFaceChat

Both methods show ~116 models dynamically fetched from the Hugging Face API.

Basic Usage

Simply use the model name directly:

llm -m meta-llama/Llama-3.1-8B-Instruct "Write a poem about translation"

With options:

llm -m Qwen/Qwen2.5-Coder-32B-Instruct \
  -o temperature 0.7 \
  -o max_tokens 500 \
  "Write a Python function to sort a list"

With a specific provider:

llm -m meta-llama/Llama-3.1-8B-Instruct \
  -o provider sambanova \
  "What is the capital of France?"

In chat mode:

llm chat -m meta-llama/Llama-3.1-8B-Instruct

With system prompt:

llm -m Qwen/Qwen2.5-Coder-32B-Instruct \
  -s "You are a helpful coding assistant" \
  "How do I sort a list in Python?"

Available Options

  • provider (optional): Specify a provider (e.g., sambanova, together, fireworks-ai, groq)
    • If not specified, Hugging Face automatically selects the best available provider
    • Note: Not all providers support all models
  • temperature: Sampling temperature between 0.0 and 2.0 (default: provider default)
  • max_tokens: Maximum number of tokens to generate (default: provider default)
  • top_p: Nucleus sampling parameter between 0.0 and 1.0 (default: provider default)

Supported Providers

When using the provider option, you can choose from:

  • sambanova
  • together
  • fireworks-ai
  • groq
  • cerebras
  • hyperbolic
  • featherless-ai
  • nebius
  • novita
  • And more!

Note: Each provider supports different models. If you request a model from a provider that doesn't support it, you'll get an error message.

Finding More Models

All models available through Hugging Face Inference Providers are automatically discoverable using the commands above.

You can also browse available models at:

The plugin uses the same model list as the Hugging Face API, so any model shown in the playground should work with this plugin. Run llm hf refresh periodically to update your local model list.

Logging

All prompts and responses are automatically logged. View logs with:

llm logs

View the most recent entry:

llm logs -n 1

License

Apache 2.0

About

Hugging Face plugin for Simon Willison's LLM tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages