RFC: Simplify `flytekit.Resources` #6245

granthamtaylor · 2025-02-14T18:46:45Z

Defining resources for tasks is error prone and counter-intuitive due to the nuances around requests and limits.

This RFC discusses an alternative solution to simplify the definition of resources, remove some footguns, and smoothen flytekit altogether.

Signed-off-by: Grantham Taylor <[email protected]>

welcome · 2025-02-14T18:46:48Z

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
Sign off your commits (Reference: DCO Guide).

flyte-bot · 2025-02-14T18:47:08Z

Code Review Agent Run Status

Limitations and other issues: ❌ Failure - The AI Code Review Agent skipped reviewing this change because it is configured to exclude certain pull requests based on the source/target branch or the pull request status. You can change the settings here, or contact the agent instance creator at [email protected].

codecov · 2025-02-14T18:50:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 36.87%. Comparing base (b04df59) to head (fa3fb0a).

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #6245       +/-   ##
===========================================
- Coverage   50.51%   36.87%   -13.64%     
===========================================
  Files        1162     1318      +156     
  Lines       91811   134703    +42892     
===========================================
+ Hits        46375    49672     +3297     
- Misses      41340    80703    +39363     
- Partials     4096     4328      +232

Flag	Coverage Δ
unittests-datacatalog	`51.58% <ø> (ø)`
unittests-flyteadmin	`51.96% <ø> (+0.02%)`	⬆️
unittests-flytecopilot	`30.99% <ø> (?)`
unittests-flytectl	`62.29% <ø> (ø)`
unittests-flyteidl	`7.22% <ø> (?)`
unittests-flyteplugins	`54.03% <ø> (ø)`
unittests-flytepropeller	`42.77% <ø> (-0.01%)`	⬇️
unittests-flytestdlib	`55.35% <ø> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cjidboon94 · 2025-02-14T21:14:58Z

This RFC would be a big loss for me. I love having control over reqs/lims directly.
The suggestion in the RFC could be a new golden path sure if it's supposed to "smoothen" things,
while keeping the classic requests/limits setting for power users and people think they know what they are doing with a warning in code and documentation that it's for advanced users.

granthamtaylor · 2025-02-16T15:23:16Z

This RFC would be a big loss for me. I love having control over reqs/lims directly.

The suggestion in the RFC could be a new golden path sure if it's supposed to "smoothen" things,

while keeping the classic requests/limits setting for power users and people think they know what they are doing with a warning in code and documentation that it's for advanced users.

Hi! Thank you for sharing. Perhaps, going forward, we can allow power users to define the request and limits separately in the form of a tuple ("1Gi", "2Gi"). I think that, ideally, this is preferable over the current pattern of two separate arguments ("requests" and "limits") in that, as a single argument it'll be easy to validate that within Resources. Additionally, I think this will help keep new users from hurting themselves.

Is it just mem and cpu that you want to have complete control over?

katrogan · 2025-02-18T08:36:39Z

rfc/system/0000-simplify-resources.md

+
+Some of the footguns around `requests` and `limits` are non-deterministic and not intuitive, thus can escape detection during development but have significant and detrimental impact in production.
+
+For example, the possibility of setting `requests < limits` for `mem` can lead to a pod becoming overloaded. For `cpu`, doing so can lead to CPU availability, for some multi-processing applications, to be non-deterministic, thus resulting in unpredictable performance degradation.


for the memory case in particular, when a requests != limits the pod is considered burstable (quality of service designation) and more likely to be evicted when the node is under memory pressure

katrogan · 2025-02-18T08:38:40Z

rfc/system/0000-simplify-resources.md

+
+## 6 Alternatives
+
+Should any resource type (`CPU`) need both `limits` and `requests`, we should allow definition of them via a `tuple` (IE `(400m, 600m)`).


+1 to having an escape hatch but prioritizing the common use-case

katrogan · 2025-02-18T08:39:59Z

@cjidboon94 that's really helpful feedback, do you mind expanding in which circumstances you prefer to set request != limits

fg91 · 2025-02-27T18:21:38Z

I agree that the UX of the task arguments @task(requests=..., limits=..., accelerator=..., shared_memory=) is definitely very Kubernetes-centric:

separate requests and limits directly translate to pod requests/limits
accelerator not part of requests/limits because it translates to a node selector/affinity and toleration
shared memory not part of requests limits because it translates to a volume+mount

I also agree that for ML engineers/Data Scientists who don't know/care about K8s, the UX can feel a bit awkward. Especially the intricacies of requests vs. limits.

We maintain a decorator wrapping the flytekit task decorator which derives the limits from the requests in a way that makes sense for us. Our users never use the limits arg.

At the same time, I feel for platform engineers it absolutely makes sense to have full control over both requests and limits.

To summarize:
From an ML engineer's perspective, I would favor:

Aggregation of cpu, memory, gpu, shared memory, ephemeral storage, ... into a single struct
Deriving sane defaults for the limits derived from the specified requests so that I as an ML engineer don't have to think about K8s intricacies.

From a platform engineer's perspective:

I would like to be able to override the requests/limits derived by default from the specified resources.

All of this being said, this is a very breaking change which needs to be considered carefullly.

Create 0000-simplify-resources.md

fa3fb0a

Signed-off-by: Grantham Taylor <[email protected]>

granthamtaylor requested review from katrogan and davidmirror-ops February 14, 2025 18:47

davidmirror-ops added the rfc A label for RFC issues label Feb 14, 2025

katrogan reviewed Feb 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Simplify `flytekit.Resources` #6245

RFC: Simplify `flytekit.Resources` #6245

granthamtaylor commented Feb 14, 2025

welcome bot commented Feb 14, 2025

flyte-bot commented Feb 14, 2025

codecov bot commented Feb 14, 2025 •

edited

Loading

cjidboon94 commented Feb 14, 2025

granthamtaylor commented Feb 16, 2025

katrogan Feb 18, 2025

katrogan Feb 18, 2025

katrogan commented Feb 18, 2025

fg91 commented Feb 27, 2025


		Some of the footguns around `requests` and `limits` are non-deterministic and not intuitive, thus can escape detection during development but have significant and detrimental impact in production.

		For example, the possibility of setting `requests < limits` for `mem` can lead to a pod becoming overloaded. For `cpu`, doing so can lead to CPU availability, for some multi-processing applications, to be non-deterministic, thus resulting in unpredictable performance degradation.


		## 6 Alternatives

		Should any resource type (`CPU`) need both `limits` and `requests`, we should allow definition of them via a `tuple` (IE `(400m, 600m)`).

RFC: Simplify flytekit.Resources #6245

Are you sure you want to change the base?

RFC: Simplify flytekit.Resources #6245

Conversation

granthamtaylor commented Feb 14, 2025

welcome bot commented Feb 14, 2025

flyte-bot commented Feb 14, 2025

Code Review Agent Run Status

codecov bot commented Feb 14, 2025 • edited Loading

Codecov Report

cjidboon94 commented Feb 14, 2025

granthamtaylor commented Feb 16, 2025

katrogan Feb 18, 2025

Choose a reason for hiding this comment

katrogan Feb 18, 2025

Choose a reason for hiding this comment

katrogan commented Feb 18, 2025

fg91 commented Feb 27, 2025

RFC: Simplify `flytekit.Resources` #6245

RFC: Simplify `flytekit.Resources` #6245

codecov bot commented Feb 14, 2025 •

edited

Loading