-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] add driver/executor pod in Spark #3016
base: master
Are you sure you want to change the base?
[FEAT] add driver/executor pod in Spark #3016
Conversation
Thank you for opening this pull request! 🙌 These tips will help get your PR across the finish line:
|
2ff8b9a
to
af03383
Compare
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
af03383
to
7793398
Compare
Code Review Agent Run #3c7587Actionable Suggestions - 2
Additional Suggestions - 1
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
@@ -176,6 +185,22 @@ def get_custom(self, settings: SerializationSettings) -> Dict[str, Any]: | |||
|
|||
return MessageToDict(job.to_flyte_idl()) | |||
|
|||
def to_k8s_pod(self, pod_template: PodTemplate | None, settings: SerializationSettings) -> K8sPod | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding type hints for the return value of _get_container()
in the to_k8s_pod()
method. The method appears to use this internal method but its return type is not clearly specified in the type hints.
Code suggestion
Check the AI-generated fix before applying
def to_k8s_pod(self, pod_template: PodTemplate | None, settings: SerializationSettings) -> K8sPod | None: | |
def to_k8s_pod(self, pod_template: PodTemplate | None, settings: SerializationSettings) -> K8sPod | None: | |
from flytekit.models import task as _task_model | |
_get_container: Callable[..., _task_model.Container] = self._get_container |
Code Review Run #3c7587
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Take the container with name set in driver/executor podTempalte primary_container_name Signed-off-by: machichima <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3016 +/- ##
===========================================
- Coverage 80.01% 46.67% -33.34%
===========================================
Files 318 319 +1
Lines 27075 26695 -380
Branches 2779 2806 +27
===========================================
- Hits 21663 12461 -9202
- Misses 4647 14123 +9476
+ Partials 765 111 -654 ☔ View full report in Codecov by Sentry. |
Code Review Agent Run #f512d4Actionable Suggestions - 0Review Details
|
Exclude those in the podTemplate of spark driver/executor pod Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Code Review Agent Run #27c6aeActionable Suggestions - 2
Review Details
|
flytekit/core/utils.py
Outdated
if task_type != "spark": | ||
# for spark driver/executor, do not use the command and args from task podTemplate | ||
container.command = primary_container.command | ||
container.args = primary_container.args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting the Spark-specific container command/args logic into a separate helper function to improve code organization and readability. The current nested if condition makes the code harder to follow.
Code suggestion
Check the AI-generated fix before applying
- if task_type != "spark":
- # for spark driver/executor, do not use the command and args from task podTemplate
- container.command = primary_container.command
- container.args = primary_container.args
+ if _should_copy_container_command_args(task_type):
+ container.command = primary_container.command
+ container.args = primary_container.args
+
def _should_copy_container_command_args(task_type: str) -> bool:
+ # for spark driver/executor, do not use the command and args from task podTemplate
+ return task_type != "spark"
Code Review Run #27c6ae
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
pod_spec=driver_pod_spec_dict_remove_None, # type: ignore | ||
) | ||
|
||
target_executor_k8sPod = K8sPod( | ||
metadata=K8sObjectMetadata( | ||
labels={"lKeyA_e": "lValA", "lKeyB_e": "lValB"}, | ||
annotations={"aKeyA_e": "aValA", "aKeyB_e": "aValB"}, | ||
), | ||
pod_spec=executor_pod_spec_dict_remove_None, # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing the # type: ignore
comments and properly typing the pod_spec
parameter to match the expected type.
Code suggestion
Check the AI-generated fix before applying
- pod_spec=driver_pod_spec_dict_remove_None, # type: ignore
+ pod_spec=V1PodSpec(**driver_pod_spec_dict_remove_None),
@@ -378,1 +378,1 @@
- pod_spec=executor_pod_spec_dict_remove_None, # type: ignore
+ pod_spec=V1PodSpec(**executor_pod_spec_dict_remove_None),
Code Review Run #27c6ae
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Code Review Agent Run #41dd0bActionable Suggestions - 1
Review Details
|
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
c6a8f94
to
d6b752b
Compare
Signed-off-by: machichima <[email protected]>
Code Review Agent Run Status
|
flytekit/core/utils.py
Outdated
@@ -176,8 +177,10 @@ def _serialize_pod_spec( | |||
else: | |||
container.image = get_registerable_container_image(container.image, settings.image_config) | |||
|
|||
container.command = primary_container.command | |||
container.args = primary_container.args | |||
if task_type != "spark": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use this function to create a k8sPod from podTemplate.
flytekit/flytekit/models/task.py
Lines 1079 to 1083 in 2ef875c
def from_pod_template(cls, pod_template: "PodTemplate") -> "K8sPod": | |
return cls( | |
metadata=K8sObjectMetadata(labels=pod_template.labels, annotations=pod_template.annotations), | |
pod_spec=ApiClient().sanitize_for_serialization(pod_template.pod_spec), | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information! I changed using this function and remove task_type
in _serialize_pod_spec
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Code Review Agent Run Status
|
@machichima any chance we could expand the Spark plugin docs to include your example? |
Sure! Is it ok to add the pod_template settings into the existing |
…-driver-executor-podtemplate Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
01bc98a
to
7f4e00b
Compare
Code Review Agent Run Status
|
The docs is updated here: flyteorg/flytesnacks#1782 |
Tracking issue
Related to flyteorg/flyte#4105
Why are the changes needed?
This PR update the flytekit-spark package to configure driver pod and executor pod separately using PodTemplate. Enable setting the separate primary_container_name for driver/executor pod separate from the task podTemplate.
What changes were proposed in this pull request?
Add driver_pod and executor_pod field with type PodTemplate in SparkJob.
How was this patch tested?
test_spark_driver_executor_podSpec
@task
forhello_spark
function inmy_spark
example here as follow to set the driver_pod and executor_pod.Verify the pods have Tolerations and EnvVar set.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
flyteorg/flyte#6085
Docs link
Summary by Bito
Enhanced flytekit-spark package by implementing configurable driver and executor pod support through PodTemplate. Added driver_pod and executor_pod fields to SparkJob model with primary_only flag for pod spec serialization. The implementation includes type hint updates from K8sPod to PodTemplate, parameter order modifications, and improved SparkSession cleanup in tests. This enables granular control and customization of labels, annotations, containers, and tolerations for both driver and executor pods.Unit tests added: True
Estimated effort to review (1-5, lower is better): 2