Llama.cpp in Docker

Run llama.cpp in a GPU accelerated Docker container.

Minimum requirements

By default, the service requires a CUDA capable GPU with at least 8GB+ of VRAM. If you don't have an Nvidia GPU with CUDA then the CPU version will be built and used instead.

Quickstart

make build
make llama-3-8b
make up

After starting up the chat server will be available at http://localhost:8080.

Options

Options are specified as environment variables in the docker-compose.yml file. By default, the following options are set:

GGML_CUDA_NO_PINNED: Disable pinned memory for compatability (default is 1)
LLAMA_ARG_CTX_SIZE: The context size to use (default is 2048)
LLAMA_ARG_MODEL: The name of the model to use (default is /models/Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf)
LLAMA_ARG_N_GPU_LAYERS: The number of layers to run on the GPU (default is 99)

See the llama.cpp documentation for the complete list of server options.

Models

The docker-entrypoint.sh has targets for downloading popular models. Run ./docker-entrypoint.sh --help to list available models. Download models by running ./docker-entrypoint.sh <model> where <model> is the name of the model. By default, these will download the _Q5_K_M.gguf versions of the models. These models are quantized to 5 bits which provide a good balance between speed and accuracy.

Confused about which model to use? Below is a list of popular models, ranked by ELO rating. Generally, the higher the ELO rating the better the model.

Target	Model	Parameters	Size	~Score	~ELO	Notes
deepseek-r1-qwen-14b	`deepseek-r1-distill-qwen-14b`	14B	10.5 GB	38.22	1360	The best small thinking model
gemma-3-27b	`gemma-3-27b-it`	27B	19.27 GB	36.17+	1339	Google's best medium model
mistral-small-3	`mistral-small-3.1-24b-instruct`	24B	16.76 GB	29.92+	1214	Mistral AI's best small model
llama-3-8b	`meta-llama-3.1-8b-instruct`	8B	5.73 GB	23.76	1176	Meta's best small model
phi-4-mini	`phi-4-mini-instruct`	4B	2.85 GB	29.41	1070++	Microsoft's best tiny model

Note

Values with + are minimum estimates from previous versions of the model due to missing data.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
models		models
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Dockerfile-cpu		Dockerfile-cpu
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama.cpp in Docker

Minimum requirements

Quickstart

Options

Models

About

Releases 18

Packages

Languages

License

fboulnois/llama-cpp-docker

Folders and files

Latest commit

History

Repository files navigation

Llama.cpp in Docker

Minimum requirements

Quickstart

Options

Models

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 18

Packages 0

Languages

Packages