We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
A high-throughput and memory-efficient inference and serving engine for LLMs
Python 66.1k 12.2k
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python 2.5k 334
Common recipes to run vLLM
Jupyter Notebook 300 111
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
Python 173 22
Intelligent Router for Mixture-of-Models
Go 2.6k 358
Community maintained hardware plugin for vLLM on Intel Gaudi
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
Community maintained hardware plugin for vLLM on Ascend
vLLM Daily Summarization of Merged PRs
A framework for efficient model inference with omni-modality models
TPU inference for vLLM, with unified JAX and PyTorch support.
The vLLM XPU kernels for Intel GPU
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs