Skip to content

Commit 5127127

Browse files
committed
Adding Design Principles
1 parent 16ded66 commit 5127127

File tree

2 files changed

+39
-0
lines changed

2 files changed

+39
-0
lines changed

mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ nav:
5252
- Introduction: index.md
5353
- Concepts:
5454
API Overview: concepts/api-overview.md
55+
Design Principles: concepts/design-principles.md
5556
Conformance: concepts/conformance.md
5657
Roles and Personas: concepts/roles-and-personas.md
5758
- Implementations: implementations.md
+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Design Principles
2+
3+
## Focus on the core interfaces
4+
5+
There are two interfaces of note here:
6+
7+
### 1. Gateway -> Endpoint Picker
8+
At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to.
9+
10+
### 2. Endpoint Picker -> Model Server Framework
11+
This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.
12+
13+
Although we can extend these interfaces in the future, it’s critical to get these right early in the project and stabilize them as soon as possible. We want to be able to give controller and extension developers a stable target to build against.
14+
15+
16+
## The default out of the box experience should be compelling
17+
18+
We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization.
19+
20+
21+
## Encourage innovation via extensibility
22+
23+
This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.
24+
25+
26+
## Objectives over instructions
27+
28+
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.
29+
30+
31+
## Extend instead of reinvent
32+
33+
Although it’s tempting to develop an entirely new form of AI-focused Gateways, the reality is that there are a lot of baseline routing capabilities needed for any Gateway that have already been well defined by Kubernetes. Instead of trying to reinvent the full stack, this project should allow both networking and AI experts to focus on their respective strengths. Existing Gateways can be utilized for all the existing routing capabilities they provide, while this extensible model can enable AI experts to focus exclusively on how an endpoint is selected.
34+
35+
36+
## Additions to the API should be carefully prioritized
37+
38+
Every addition to the API should take the principles described above into account. Given that the goal of the API is to encourage a highly extensible ecosystem, each additional feature in the API is raising the barrier for entry to any new controller or extension. Our top priority should be to focus on concepts that we expect to be broadly implementable and useful. The extensible nature of this API will allow each individual implementation to experiment with new features via custom flags or APIs before they become part of the core API surface.

0 commit comments

Comments
 (0)