|
| 1 | +# Design Principles |
| 2 | + |
| 3 | +## Focus on the core interfaces |
| 4 | + |
| 5 | +There are two interfaces of note here: |
| 6 | + |
| 7 | +### 1. Gateway -> Endpoint Picker |
| 8 | +At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to. |
| 9 | + |
| 10 | +### 2. Endpoint Picker -> Model Server Framework |
| 11 | +This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics. |
| 12 | + |
| 13 | +Although we can extend these interfaces in the future, it’s critical to get these right early in the project and stabilize them as soon as possible. We want to be able to give controller and extension developers a stable target to build against. |
| 14 | + |
| 15 | + |
| 16 | +## The default out of the box experience should be compelling |
| 17 | + |
| 18 | +We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization. |
| 19 | + |
| 20 | + |
| 21 | +## Encourage innovation via extensibility |
| 22 | + |
| 23 | +This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic. |
| 24 | + |
| 25 | + |
| 26 | +## Objectives over instructions |
| 27 | + |
| 28 | +The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread. |
| 29 | + |
| 30 | + |
| 31 | +## Extend instead of reinvent |
| 32 | + |
| 33 | +Although it’s tempting to develop an entirely new form of AI-focused Gateways, the reality is that there are a lot of baseline routing capabilities needed for any Gateway that have already been well defined by Kubernetes. Instead of trying to reinvent the full stack, this project should allow both networking and AI experts to focus on their respective strengths. Existing Gateways can be utilized for all the existing routing capabilities they provide, while this extensible model can enable AI experts to focus exclusively on how an endpoint is selected. |
| 34 | + |
| 35 | + |
| 36 | +## Additions to the API should be carefully prioritized |
| 37 | + |
| 38 | +Every addition to the API should take the principles described above into account. Given that the goal of the API is to encourage a highly extensible ecosystem, each additional feature in the API is raising the barrier for entry to any new controller or extension. Our top priority should be to focus on concepts that we expect to be broadly implementable and useful. The extensible nature of this API will allow each individual implementation to experiment with new features via custom flags or APIs before they become part of the core API surface. |
0 commit comments