Skip to content

Commit 6fb4cef

Browse files
committed
docs: add concepts and definitions to README.md
Signed-off-by: Shane Utt <[email protected]>
1 parent c8d0d62 commit 6fb4cef

File tree

109 files changed

+94032
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+94032
-1
lines changed

README.md

+50-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,56 @@
22
[![Go Reference](https://pkg.go.dev/badge/sigs.k8s.io/gateway-api-inference-extension.svg)](https://pkg.go.dev/sigs.k8s.io/gateway-api-inference-extension)
33
[![License](https://img.shields.io/github/license/kubernetes-sigs/gateway-api-inference-extension)](/LICENSE)
44

5-
# Gateway API Inference Extension
5+
# Gateway API Inference Extension (GIE)
6+
7+
This project offers tools for AI Inference, enabling developers to build [Inference Gateways].
8+
9+
[Inference Gateways]:#concepts-and-definitions
10+
11+
## Concepts and Definitions
12+
13+
The following are some key industry terms that are important to understand for
14+
this project:
15+
16+
- **Model**: A generative AI model that has learned patterns from data and is
17+
used for inference. Models vary in size and architecture, from smaller
18+
domain-specific models to massive multi-billion parameter neural networks that
19+
are optimized for diverse language tasks.
20+
- **Inference**: The process of running a generative AI model, such as a large
21+
language model, diffusion model etc, to generate text, embeddings, or other
22+
outputs from input data.
23+
- **Model server**: A service (in our case, containerized) responsible for
24+
receiving inference requests and returning predictions from a model.
25+
- **Accelerator**: specialized hardware, such as Graphics Processing Units
26+
(GPUs) that can be attached to Kubernetes nodes to speed up computations,
27+
particularly for training and inference tasks.
28+
29+
And the following are more specific terms to this project:
30+
31+
- **Scheduler**: Makes decisions about which endpoint is optimal (best cost /
32+
best performance) for an inference request based on `Metrics and Capabilities`
33+
from [Model Serving](/docs/proposals/003-model-server-protocol/README.md).
34+
- **Metrics and Capabilities**: Data provided by model serving platforms about
35+
performance, availability and capabilities to optimize routing. Includes
36+
things like [Prefix Cache] status or [LoRA Adapters] availability.
37+
- **Endpoint Selector**: A `Scheduler` combined with `Metrics and Capabilities`
38+
systems is often referred to together as an [Endpoint Selection Extension]
39+
(this is also sometimes referred to as an "endpoint picker", or "EPP").
40+
- **Inference Gateway**: A proxy/load-balancer which has been coupled with a
41+
`Endpoint Selector`. It provides optimized routing and load balancing for
42+
serving Kubernetes self-hosted generative Artificial Intelligence (AI)
43+
workloads. It simplifies the deployment, management, and observability of AI
44+
inference workloads.
45+
46+
For deeper insights and more advanced concepts, refer to our [proposals](/docs/proposals).
47+
48+
[Inference]:https://www.digitalocean.com/community/tutorials/llm-inference-optimization
49+
[Gateway API]:https://github.com/kubernetes-sigs/gateway-api
50+
[Prefix Cache]:https://docs.vllm.ai/en/stable/design/v1/prefix_caching.html
51+
[LoRA Adapters]:https://docs.vllm.ai/en/stable/features/lora.html
52+
[Endpoint Selection Extension]:https://gateway-api-inference-extension.sigs.k8s.io/#endpoint-selection-extension
53+
54+
## Technical Overview
655

756
This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter)-capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **inference gateway** - supporting inference platform teams self-hosting large language models on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
857

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Patterns to ignore when building packages.
2+
# This supports shell glob matching, relative path matching, and
3+
# negation (prefixed with !). Only one pattern per line.
4+
.DS_Store
5+
# Common VCS dirs
6+
.git/
7+
.gitignore
8+
.bzr/
9+
.bzrignore
10+
.hg/
11+
.hgignore
12+
.svn/
13+
# Common backup files
14+
*.swp
15+
*.bak
16+
*.tmp
17+
*.orig
18+
*~
19+
# Various IDEs
20+
.project
21+
.idea/
22+
*.tmproj
23+
.vscode/
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
apiVersion: v2
2+
appVersion: 1.16.0
3+
description: A Helm chart for the kgateway project CRDs
4+
name: kgateway-crds
5+
type: application
6+
version: v2.0.0

0 commit comments

Comments
 (0)