Skip to content

Conversation

liaolecheng
Copy link
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:
Part of #6587

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: Introduced built-in interpreter for SparkApplication.

@Copilot Copilot AI review requested due to automatic review settings October 10, 2025 13:54
@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 10, 2025
Copy link

Summary of Changes

Hello @liaolecheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates a new built-in resource interpreter for SparkApplication objects into Karmada. This feature significantly improves Karmada's understanding and management capabilities for Spark workloads by providing custom logic for health checks, detailed resource component extraction, accurate replica counting, and comprehensive status synchronization. These additions enable more robust and efficient orchestration of Spark applications across multiple clusters.

Highlights

  • Built-in Interpreter for SparkApplication: Introduced a new built-in resource interpreter for SparkApplication resources (apiVersion: sparkoperator.k8s.io/v1beta2), enhancing Karmada's ability to manage Spark workloads.
  • Custom Health Interpretation: Implemented a Lua script for healthInterpretation that determines the health of a SparkApplication based on its status.applicationState.state, marking it healthy if not 'PENDING' or 'UNKNOWN'.
  • Detailed Component Resource Extraction: Added Lua logic to componentResource to extract detailed resource requests (CPU, memory, GPU) and pod template configurations (nodeSelector, tolerations, affinity, priorityClassName) for both Spark driver and executor components.
  • Replica and Aggregated Resource Calculation: Provided a replicaResource Lua script to calculate the total number of replicas (driver + executors) and aggregate resource requirements by taking the maximum CPU and memory requests between the driver and executors.
  • Status Aggregation and Reflection: Included Lua scripts for statusAggregation and statusReflection to correctly aggregate and reflect key status fields of SparkApplication objects, such as sparkApplicationId, applicationState, completionTime, driverInfo, executorState, and LastSubmissionAttemptTime.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 10, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a built-in resource interpreter for SparkApplication resources from the Spark Operator (v1beta2) to support better resource management in Karmada. The implementation provides customized interpretation logic for health status, resource requirements, replica counting, and status aggregation specific to Spark applications.

  • Adds comprehensive Lua scripts for interpreting SparkApplication health, replicas, and component resources
  • Implements status aggregation and reflection logic for SparkApplication status fields
  • Provides test data and test configuration to validate the interpreter functionality

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
customizations.yaml Core interpreter configuration with Lua scripts for health, replicas, components, and status handling
customizations_tests.yaml Test configuration defining operations to validate the interpreter
desired-sparkapplication.yaml Test data representing the desired SparkApplication spec
observed-sparkapplication.yaml Test data representing an observed SparkApplication with status
status-file.yaml Test data containing status information for aggregation testing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

Summary of Changes

Hello @liaolecheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates native support for SparkApplication resources into the system's resource interpretation framework. By providing a built-in interpreter, the platform gains enhanced capabilities to accurately assess the health, resource demands, and operational status of Spark applications, streamlining their management and deployment within the environment.

Highlights

  • Built-in Interpreter for SparkApplication: Introduced a new default resource interpreter for SparkApplication (sparkoperator.k8s.io/v1beta2) resources, enabling the system to understand and manage Spark workloads more effectively.
  • Lua Script for Health Interpretation: Added a Lua script for healthInterpretation that determines the health of a SparkApplication based on its applicationState.state, considering 'PENDING' and 'UNKNOWN' states as unhealthy.
  • Lua Script for Component Resource Interpretation: Implemented a Lua script for componentResource to extract detailed resource requests (CPU, memory, GPU) and pod template configurations (nodeSelector, tolerations, affinity, priorityClassName) for both the Spark driver and executor components.
  • Lua Script for Replica Resource Interpretation: Provided a Lua script for replicaResource that calculates the total number of replicas and aggregates resource requirements (CPU, memory) across the Spark driver and executors.
  • Lua Scripts for Status Aggregation and Reflection: Included Lua scripts for statusAggregation to combine status information from multiple sources and statusReflection to extract relevant status fields from an observed SparkApplication object.
  • Comprehensive Test Data: Added a suite of test YAML files (customizations_tests.yaml, desired-sparkapplication.yaml, observed-sparkapplication.yaml, status-file.yaml) to validate the functionality of the new SparkApplication resource interpreter across various operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

Summary of Changes

Hello @liaolecheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates native support for SparkApplication resources into Karmada's resource interpretation framework. By defining a default ResourceInterpreterCustomization, it allows Karmada to effectively manage and orchestrate Spark workloads, including understanding their health, resource requirements, replica counts, and status across multiple clusters, thereby expanding Karmada's supported application types.

Highlights

  • SparkApplication Interpreter: Implemented a default resource interpreter for sparkoperator.k8s.io/v1beta2/SparkApplication resources within Karmada.
  • Comprehensive Interpretation Logic: Provided Lua scripts for healthInterpretation, componentResource, replicaResource, statusAggregation, and statusReflection to enable full lifecycle management of Spark applications.
  • Resource and Node Claim Extraction: The componentResource script now accurately extracts CPU, memory, GPU requests, and node scheduling constraints (nodeSelector, tolerations, affinity) for both driver and executor components.
  • Replica and Combined Resource Calculation: The replicaResource script calculates the total number of replicas and aggregates resource requests across driver and executors for the entire SparkApplication.
  • Status Synchronization: Included logic for reflecting and aggregating SparkApplication status fields, ensuring consistent state reporting across clusters.
  • Test Coverage: Added comprehensive test files (customizations_tests.yaml and associated test data) to validate the new interpreter's functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a default resource interpreter for SparkApplication resources, which is a great addition. The implementation covers health interpretation, component resource discovery, replica counting, status aggregation, and status reflection using Lua scripts. The overall approach is solid, but I've found several issues in the Lua scripts related to correctness and robustness, such as incorrect field names (camelCase vs. PascalCase), unsafe access to object properties, and use of global variables. I've left detailed comments with suggestions to address these points. Additionally, one of the test data files contains a field with an incorrect name, which should also be corrected.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a default resource interpreter for SparkApplication resources. The implementation is comprehensive, covering health interpretation, resource discovery for components, replica counting, status aggregation, and status reflection using Lua scripts. The changes are well-structured and include necessary test files. My review focuses on improving the Lua scripts for correctness and maintainability, as well as enhancing the test coverage and fixing minor issues in the test data. The most critical feedback is regarding the AggregateStatus function, which has flawed logic and uses global variables.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a resource interpreter for SparkApplication CRD. The implementation contains several bugs in the Lua scripts for resource and status interpretation, including incorrect field names, improper use of helper functions, lack of nil-safety checks, and flawed status aggregation logic. I've provided detailed comments and suggestions to address these critical issues.

@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.88%. Comparing base (070927d) to head (bf3dc32).
⚠️ Report is 10 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6818      +/-   ##
==========================================
+ Coverage   45.84%   45.88%   +0.03%     
==========================================
  Files         690      690              
  Lines       57300    57392      +92     
==========================================
+ Hits        26271    26333      +62     
- Misses      29399    29423      +24     
- Partials     1630     1636       +6     
Flag Coverage Δ
unittests 45.88% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liaolecheng liaolecheng force-pushed the spark branch 2 times, most recently from 4f95e73 to 9402ef8 Compare October 10, 2025 14:32
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rainbowmango. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liaolecheng liaolecheng force-pushed the spark branch 3 times, most recently from ac4c55d to 857206a Compare October 11, 2025 02:28
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at the InterpretHealth. Please cc me once it's ready for review, as I see you are updating this PR right now.

Please add a test report on how to test it.

Comment on lines 12 to 27
function InterpretHealth(observedObj)
if observedObj and observedObj.status and observedObj.status.applicationState and observedObj.status.applicationState.state then
local state = observedObj.status.applicationState.state
return state ~= 'PENDING' and state ~= 'UNKNOWN'
end
return false
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function InterpretHealth(observedObj)
if observedObj and observedObj.status and observedObj.status.applicationState and observedObj.status.applicationState.state then
local state = observedObj.status.applicationState.state
return state ~= 'PENDING' and state ~= 'UNKNOWN'
end
return false
end
function InterpretHealth(observedObj)
if not observedObj or
not observedObj.status or
not observedObj.status.applicationState or
not observedObj.status.applicationState.state then
return false
end
-- Only the 'FAILED' state is considered unhealthy. All other states are treated
-- as healthy or recoverable.
local state = observedObj.status.applicationState.state
if state == 'FAILED' then
return false
end
return true

@liaolecheng liaolecheng force-pushed the spark branch 2 times, most recently from 13ad9f5 to 7593da0 Compare October 11, 2025 04:08
@liaolecheng
Copy link
Contributor Author

@RainbowMango I've revised the code, and it's now ready for review. I've been quite busy with other matters recently, but I’ll make sure to provide the test report (detailing how to test it) within the next couple of days. Thank you for your understanding and patience!

@liaolecheng
Copy link
Contributor Author

liaolecheng commented Oct 12, 2025

Test Report

E2E Testing

Test Steps

  1. Apply the SparkApplication CRD on the Karmada control plane.
  2. Install SparkOperator on the member1 clusters respectively.
  3. Submit the spark-pi.yaml file on the Karmada control plane. The file content is as follows:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "spark:3.5.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar"
  sparkVersion: "3.5.0"
  sparkUIOptions:
    serviceLabels:
      test-label/v1: 'true'
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 2
    memory: "512m"
    labels:
      version: 3.5.0
    serviceAccount: spark-operator-spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 2
    memory: "512m"
    labels:
      version: 3.5.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  1. Submit the spark-pi-pp.yaml file on the Karmada control plane. The file content is as follows:
apiVersion: policy.karmada.io/v1alpha1​
kind: PropagationPolicy​
metadata:​
  name: spark-pi-pp​
  namespace: default ​
spec:​
  resourceSelectors:​
    - apiVersion: sparkoperator.k8s.io/v1beta2 ​
      kind: SparkApplication                    ​
      name: spark-pi                             ​
  placement:​
    clusterAffinity:​
      clusterNames:​
        - member1​

Test Result

  1. Check the to verify the correctness of multi-component resource parsing.
image
  1. Confirm that the aggregatedStatus correctly summarizes the health of resources across member clusters:
image
  1. Check the spark-pi application on the Karmada control plane and member cluster respectively to verify the correctness of status parsing.
image

Unit Testing

Test Steps

  1. Modify testdata/desired-sparkapplication.yaml to add GPU resources:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "spark:3.5.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar"
  sparkVersion: "3.5.0"
  sparkUIOptions:
    serviceLabels:
      test-label/v1: 'true'
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 2
    memory: "512m"
    gpu:
      name: "amd.com/gpu"
      quantity: 1 
    labels:
      version: 3.5.0
    serviceAccount: spark-operator-spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 2
    memory: "512m"
    gpu:
      name: "nvidia.com/gpu"
      quantity: 1
    labels:
      version: 3.5.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  1. Use the following command to verify if GPU resources can be correctly interpreted:
karmadactl interpret -f customizations.yaml \
  ----desired-file testdata/desired-sparkapplication.yaml \
  --operation InterpretComponent

Test Result

image

…zations for SparkApplication

Signed-off-by: liaolecheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants