Skip to content

Adding Design Principles #596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ nav:
- Introduction: index.md
- Concepts:
API Overview: concepts/api-overview.md
Design Principles: concepts/design-principles.md
Conformance: concepts/conformance.md
Roles and Personas: concepts/roles-and-personas.md
- Implementations: implementations.md
Expand Down
52 changes: 52 additions & 0 deletions site-src/concepts/design-principles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Design Principles

These principles guide our efforts to build flexible [Gateway API] extensions
that empower the development of high-performance [AI Inference] routing
technologies—balancing rapid delivery with long-term growth.

!!! note "Inference Gateways"

For simplicity, we'll refer to Gateway API Gateways which are
composed together with AI Inference extensions as "Inference Gateways"
throughout this document.

[Gateway]:https://github.com/kubernetes-sigs/gateway-api
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/[Gateway]/[Gateway API]/

[AI Inference]:https://www.arm.com/glossary/ai-inference


## Prioritize stability of the core interfaces

The most critical part of this project is the interfaces between components. To encourage both controller and extension developers to integrate with this project, we need to prioritize the stability of these interfaces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is/are/

Although we can extend these interfaces in the future, it’s critical the core is stable as soon as possible.

When describing "core interfaces", we are referring to both of the following:

### 1. Gateway -> Endpoint Picker
At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to.

### 2. Endpoint Picker -> Model Server Framework
This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.


## Our presets are finely tuned
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can more clearly define this.

We want extensibility, and customization. But I think it's very important that we have a turnkey solution that works for the average person.

To word another way, I think good defaults/presets can fall under a larger umbrella of: We want a strong OOB experience for those who don't want to deeply customize. And our later points are about making this easily extensible and adaptable for those who do want to customize. Maybe that's implicit as a part of K8s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want a strong OOB experience

+1... I call this "batteries included"


Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Gateway

We need to standardize the language in the project. From my understanding, we should use "inference gateway" instead of "AI gateway." We need to do the same with the EPP. For example, the docs refer to the EPP as the "Endpoint Selection Extension". I also refer to the EPP as ESE in kubernetes/website#49898.



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete L35

## Encourage innovation via extensibility

This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.
Comment on lines +36 to +38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it as easy as possible for AI researchers to experiment with custom scheduling...

Do you and others have ideas on how to achieve this goal? Should we make the scheduler extensible or define a scheduling API that EPP and potentially other extensions call?



## Objectives over instructions

The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.
Comment on lines +41 to +43
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section, more than any other section, as a "Design Principle" is not resonating with me just yet. To me this reads like it's trying to talk about scope control. Could you please help me to better understand the intent here, by providing a somewhat detailed example of a situation that could occur which would run counter to this principle? I think that would help me to better understand what it's trying to convey 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example is configuration options for the scheduling algorithm itself, some of those configuration parameters may only be relevant to the current iteration of algorithm implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave this as-is for now, and consider my suggestion resolved. I'll bring it up for a community call, doesn't need to hold up the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example is configuration options for the scheduling algorithm itself...

It sounds like we should have a scheduler API with EPP consuming it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how_ an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.



## Composable components and reducing reinvention
While it may be tempting to develop an entirely new AI-focused Gateway, many essential routing capabilities are already well established by Kubernetes. Our focus is on creating a layer of composable components that can be assembled together with other Kubernetes components. This approach empowers engineers to use our solution as a building block—combining established technologies like Gateway API with our extensible model to build higher level solutions.
Copy link
Collaborator

@kfswain kfswain Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider rewording, perhaps this is personal bias, but the first sentence reads as if to ward off the reader from attempting to implement something new.

The later sentences focus on the value of using what has already been built, which I think is what we are going for. Perhaps move the concept of the first sentence to the end as something like:

Should you encounter a limitation, consider how existing tooling may be extended or improved first. Suggestions always welcomed (and encouraged) at our: sync-link-goes-here.



## Additions to the API should be carefully prioritized

Every addition to the API should take the principles described above into account. Given that the goal of the API is to encourage a highly extensible ecosystem, each additional feature in the API is raising the barrier for entry to any new controller or extension. Our top priority should be to focus on concepts that we expect to be broadly implementable and useful. The extensible nature of this API will allow each individual implementation to experiment with new features via custom flags or APIs before they become part of the core API surface.