-
Notifications
You must be signed in to change notification settings - Fork 1k
add interpreter-notebook #6814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add interpreter-notebook #6814
Conversation
Summary of ChangesHello @dekaihu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates support for Kubeflow Notebook resources by adding a new resource interpreter. This interpreter provides essential logic for health checking, status aggregation, and status reflection, which are crucial for managing and observing Notebook instances within a Kubernetes environment. The addition ensures that Notebooks can be properly monitored and their states accurately represented. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
5ed377c to
28ccf7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a resource interpreter customization for Kubeflow Notebooks. The implementation has several issues in the Lua scripts for health interpretation, status aggregation, and status reflection that could lead to incorrect behavior. I've provided suggestions to fix these issues and improve the robustness of the scripts. Additionally, there's a minor inconsistency in the test configuration that should be addressed.
...reter/default/thirdparty/resourcecustomizations/kubeflow.org/v1/Notebook/customizations.yaml
Show resolved
Hide resolved
...reter/default/thirdparty/resourcecustomizations/kubeflow.org/v1/Notebook/customizations.yaml
Show resolved
Hide resolved
...reter/default/thirdparty/resourcecustomizations/kubeflow.org/v1/Notebook/customizations.yaml
Show resolved
Hide resolved
...default/thirdparty/resourcecustomizations/kubeflow.org/v1/Notebook/customizations_tests.yaml
Show resolved
Hide resolved
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6814 +/- ##
==========================================
+ Coverage 45.73% 46.22% +0.49%
==========================================
Files 689 692 +3
Lines 57104 47194 -9910
==========================================
- Hits 26114 21817 -4297
+ Misses 29358 23723 -5635
- Partials 1632 1654 +22
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9d9707f to
44e4548
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a resource interpreter customization for Kubeflow Notebooks. The changes include the customization definition with Lua scripts for health interpretation, status aggregation, and status reflection, along with corresponding tests and test data.
My review of the Lua scripts has identified a couple of issues that could affect the correctness and robustness of the interpreter:
- In the
statusAggregationscript, thecontainerStateis being overwritten in each loop iteration, which will result in loss of information from all but the last member cluster. - The
statusReflectionscript performs unsafe access to status fields, which could lead to an inconsistent structure in the returned status if some fields are missing in the observed object.
I have provided specific suggestions to address these points. After these fixes, the PR should be in good shape.
...reter/default/thirdparty/resourcecustomizations/kubeflow.org/v1/Notebook/customizations.yaml
Outdated
Show resolved
Hide resolved
...reter/default/thirdparty/resourcecustomizations/kubeflow.org/v1/Notebook/customizations.yaml
Outdated
Show resolved
Hide resolved
18f0f45 to
14f50ea
Compare
...reter/default/thirdparty/resourcecustomizations/kubeflow.org/v1/Notebook/customizations.yaml
Show resolved
Hide resolved
|
For |
Notebook is a single pod template task that only relies on the pod volume mount, and the mount data generally exists in the member cluster. There should be no scenario where it relies on resource propagation. |
Okay, if new dependent resources are identified later, we can continue iterating. |
e711f27 to
23f9b07
Compare
|
karmada control surface results: member1: member2: |
|
Hi @dekaihu could you help explain this process a bit? |
OK, Notebook is a single-pod job that provides services such as Jupyter. If scheduled on a single cluster, the Karmada control plane status aggregation defaults to all fields being aggregated, directly returning the member cluster's status. For multi-cluster distribution, the status aggregation is not considered Ready until all member clusters reach Ready status, starting with the latest startup time. If services in some member clusters are not Ready, the aggregated status should be UnReady. Similarly, when aggregating data in a multi-member cluster, we define a comprehensive set of conditions for the conditions field, which is displayed as the result to the user. |
|
|
||
| if state.waiting ~= nil then | ||
| local reason = state.waiting.reason or "" | ||
| if reason == "ContainerCreating" then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the possible values for "reason"? Is there any possibility that some values are missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common Reason values for the Notebook in the Waiting state include ContainerCreating, ErrImagePull, ImagePullBackOff, CreateContainerError, and CrashLoopBackOff. Here, I only regard ContainerCreating as the state where the Notebook is still being created normally.
| if aggregatedContainerState == nil or aggregatedContainerState.running ~= nil then | ||
| aggregatedContainerState = st.containerState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this situation lead to having both running and waiting/terminated states at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation, I get it.
|
Thanks~ Can you help add the release note? /cc @RainbowMango |
668d75c to
4722e9c
Compare
|
Sorry, I didn't notice what was changed. |
ce57d0a to
8069331
Compare
Sorry, I just fixed the aggregation of empty containerState. Please refer to the following aggregation example |
karmada aggregation results: member1: member2: |
...default/thirdparty/resourcecustomizations/kubeflow.org/v1/Notebook/testdata/status-file.yaml
Show resolved
Hide resolved
8ed8297 to
ebd2b2b
Compare
Signed-off-by: hudekai <[email protected]>
d13cd61 to
dcc2735
Compare
|
Hi @dekaihu, If you're ready, you can cc me. |
|
@XiShanYongYe-Chang It's ready. The CI process e2e test occasionally fails. Is there a way to re-trigger the e2e test without submitting an empty commit? |
You need to push again; it's enough if the commit ID has changed. For example, you can use |
OK, thanks |
|
Thanks~ |
RainbowMango
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign
RainbowMango
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This pull request introduces a resource interpreter customization for Kubeflow Notebooks. The changes include the customization definition with Lua scripts for health interpretation, status aggregation, and status reflection, along with corresponding tests and test data.
Which issue(s) this PR fixes:
Fixes #
Part of #6589
Special notes for your reviewer:
Does this PR introduce a user-facing change?: