Skip to content

Commit d305093

Browse files
committed
docs: add concepts and defs to README.md
Signed-off-by: Shane Utt <[email protected]>
1 parent 1ba13f3 commit d305093

File tree

1 file changed

+24
-1
lines changed

1 file changed

+24
-1
lines changed

README.md

+24-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,27 @@
1-
# Gateway API Inference Extension
1+
# Gateway API Inference Extension (GIE)
2+
3+
This project offers tools for [AI Inference], enabling developers to build [Inference Gateways].
4+
5+
[AI Inference]:https://www.digitalocean.com/community/tutorials/llm-inference-optimization
6+
[Inference Gateways]:#concepts-and-definitions
7+
8+
## Concepts and Definitions
9+
10+
AI/ML is changing rapidly, and [Inference] goes beyond basic networking to include complex traffic routing and optimizations. Below are key terms for developers:
11+
12+
- **Scheduler**: Makes decisions about which endpoint is optimal (best cost / best performance) for an inference request based on `Metrics and Capabilities` from [Model Serving Platforms].
13+
- **Metrics and Capabilities**: Data provided by model serving platforms about performance, availability and capabilities to optimize routing. Includes things like [Prefix Cache] status or [LoRA Adapters] availability.
14+
- **Inference Gateway**: A proxy/load-balancer which has been coupled with a `Scheduler` to ensure optimized routing decisions for inference requests.
15+
16+
For deeper insights and more advanced concepts, refer to our [proposals](/docs/proposals).
17+
18+
[Inference]:https://www.digitalocean.com/community/tutorials/llm-inference-optimization
19+
[Gateway API]:https://github.com/kubernetes-sigs/gateway-api
20+
[Model Serving Platforms]:https://ubuntu.com/blog/guide-to-ml-model-serving
21+
[Prefix Cache]:https://docs.vllm.ai/en/stable/design/v1/prefix_caching.html
22+
[LoRA Adapters]:https://docs.vllm.ai/en/stable/features/lora.html
23+
24+
## Technical Overview
225

326
This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter)-capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **inference gateway** - supporting inference platform teams self-hosting large language models on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
427

0 commit comments

Comments
 (0)