Adding Design Principles #596

robscott · 2025-03-28T04:57:40Z

As the project continues to grow, it would be helpful to have some high level design principles for the project. These principles can help guide us when determining which features and work to prioritize.

/cc @ahg-g @smarterclayton @danehans @kfswain @Jeffwan @shaneutt

netlify · 2025-03-28T04:57:58Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`f2c526e`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/683a27990cbf920008a7b77a
😎 Deploy Preview	https://deploy-preview-596--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

shaneutt

Thanks @robscott!

Some comments and feedback for your considerations. 🖖

site-src/concepts/design-principles.md

shaneutt · 2025-03-28T13:58:34Z

site-src/concepts/design-principles.md

+## Encourage innovation via extensibility
+
+This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.


Love it 👍

make it as easy as possible for AI researchers to experiment with custom scheduling...

Do you and others have ideas on how to achieve this goal? Should we make the scheduler extensible or define a scheduling API that EPP and potentially other extensions call?

I think we're seeing this succeed with llm-d and the more recent #845.

site-src/concepts/design-principles.md

shaneutt

Seems like a good place to start. I do think that some more refinement can happen, but it can be iterative and maybe we can talk a bit more about it on the community calls. 👍

k8s-ci-robot · 2025-03-28T19:31:21Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: robscott, shaneutt
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

site-src/concepts/design-principles.md

kfswain · 2025-03-31T03:43:06Z

site-src/concepts/design-principles.md

+
+
+## Composable components and reducing reinvention
+While it may be tempting to develop an entirely new AI-focused Gateway, many essential routing capabilities are already well established by Kubernetes. Our focus is on creating a layer of composable components that can be assembled together with other Kubernetes components. This approach empowers engineers to use our solution as a building block—combining established technologies like Gateway API with our extensible model to build higher level solutions.


Consider rewording, perhaps this is personal bias, but the first sentence reads as if to ward off the reader from attempting to implement something new.

The later sentences focus on the value of using what has already been built, which I think is what we are going for. Perhaps move the concept of the first sentence to the end as something like:

Should you encounter a limitation, consider how existing tooling may be extended or improved first. Suggestions always welcomed (and encouraged) at our: sync-link-goes-here.

site-src/concepts/design-principles.md

danehans · 2025-04-03T16:55:05Z

site-src/concepts/design-principles.md

+## Encourage innovation via extensibility
+
+This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.


make it as easy as possible for AI researchers to experiment with custom scheduling...

Do you and others have ideas on how to achieve this goal? Should we make the scheduler extensible or define a scheduling API that EPP and potentially other extensions call?

site-src/concepts/design-principles.md

danehans · 2025-05-22T15:53:47Z

@robscott checking in to see what your plans are for this PR.

robscott · 2025-05-30T21:49:09Z

@robscott checking in to see what your plans are for this PR.

Sorry for the delay here, finally got some time to update this, PTAL.

danehans · 2025-06-10T23:39:49Z

site-src/concepts/design-principles.md

+[Gateway API]:https://github.com/kubernetes-sigs/gateway-api
+[AI Inference]:https://www.arm.com/glossary/ai-inference
+
+


nit: Remove extra line

danehans · 2025-06-10T23:42:39Z

site-src/concepts/design-principles.md

+The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how_ an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.
+
+## Composable components and reducing reinvention
+While it may be tempting to develop an entirely new AI-focused Gateway, many essential routing capabilities are already well established by Kubernetes. Our focus is on creating a layer of composable components that can be assembled together with other Kubernetes components. This approach empowers engineers to use our solution as a building block—combining established technologies like Gateway API with our extensible model to build higher level solutions. Should you encounter a limitation, consider how existing tooling may be extended or improved first.


nit: Add a space after the heading.

danehans · 2025-06-10T23:43:02Z

site-src/concepts/design-principles.md

+At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to.
+
+### 2. Endpoint Picker -> Model Server Framework
+This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.


nit: Add a space after the heading.

danehans · 2025-06-10T23:43:23Z

site-src/concepts/design-principles.md

+When describing "core interfaces", we are referring to both of the following:
+
+### 1. Gateway -> Endpoint Picker
+At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to.


nit: Add a space after the heading.

danehans · 2025-06-10T23:44:14Z

A few nits and CI is failing.

kfswain · 2025-06-13T20:26:21Z

/lgtm

kfswain · 2025-06-13T20:26:53Z

/retest

k8s-ci-robot requested review from ahg-g, danehans, Jeffwan, kfswain, shaneutt and smarterclayton March 28, 2025 04:57

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 28, 2025

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 28, 2025

shaneutt suggested changes Mar 28, 2025

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 28, 2025

robscott force-pushed the design-principles branch from 5127127 to 7e469c9 Compare March 28, 2025 17:54

shaneutt approved these changes Mar 28, 2025

View reviewed changes

kfswain reviewed Mar 31, 2025

View reviewed changes

danehans reviewed Apr 3, 2025

View reviewed changes

Adding Design Principles

f2c526e

robscott force-pushed the design-principles branch from 7e469c9 to f2c526e Compare May 30, 2025 21:48

danehans reviewed Jun 10, 2025

View reviewed changes

k8s-ci-robot assigned kfswain Jun 13, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 13, 2025

		## Encourage innovation via extensibility

		This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.



		## Composable components and reducing reinvention
		While it may be tempting to develop an entirely new AI-focused Gateway, many essential routing capabilities are already well established by Kubernetes. Our focus is on creating a layer of composable components that can be assembled together with other Kubernetes components. This approach empowers engineers to use our solution as a building block—combining established technologies like Gateway API with our extensible model to build higher level solutions.

		[Gateway API]:https://github.com/kubernetes-sigs/gateway-api
		[AI Inference]:https://www.arm.com/glossary/ai-inference

Adding Design Principles #596

Are you sure you want to change the base?

Adding Design Principles #596

Uh oh!

Conversation

robscott commented Mar 28, 2025

Uh oh!

netlify bot commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

shaneutt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shaneutt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Mar 28, 2025

Uh oh!

Uh oh!

Uh oh!

kfswain Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danehans commented May 22, 2025

Uh oh!

robscott commented May 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danehans commented Jun 10, 2025

Uh oh!

kfswain commented Jun 13, 2025

Uh oh!

kfswain commented Jun 13, 2025

Uh oh!

Uh oh!

netlify bot commented Mar 28, 2025 •

edited

Loading

shaneutt left a comment •

edited

Loading

kfswain Mar 31, 2025 •

edited

Loading