Implement default resource interpreter in third-party resourcecustomizations for SparkApplication #6818

liaolecheng · 2025-10-10T13:54:23Z

What type of PR is this?
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:
Part of #6587

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: Introduced built-in interpreter for SparkApplication.

gemini-code-assist · 2025-10-10T13:54:40Z

Summary of Changes

Hello @liaolecheng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates a new built-in resource interpreter for SparkApplication objects into Karmada. This feature significantly improves Karmada's understanding and management capabilities for Spark workloads by providing custom logic for health checks, detailed resource component extraction, accurate replica counting, and comprehensive status synchronization. These additions enable more robust and efficient orchestration of Spark applications across multiple clusters.

Highlights

Built-in Interpreter for SparkApplication: Introduced a new built-in resource interpreter for SparkApplication resources (apiVersion: sparkoperator.k8s.io/v1beta2), enhancing Karmada's ability to manage Spark workloads.
Custom Health Interpretation: Implemented a Lua script for healthInterpretation that determines the health of a SparkApplication based on its status.applicationState.state, marking it healthy if not 'PENDING' or 'UNKNOWN'.
Detailed Component Resource Extraction: Added Lua logic to componentResource to extract detailed resource requests (CPU, memory, GPU) and pod template configurations (nodeSelector, tolerations, affinity, priorityClassName) for both Spark driver and executor components.
Replica and Aggregated Resource Calculation: Provided a replicaResource Lua script to calculate the total number of replicas (driver + executors) and aggregate resource requirements by taking the maximum CPU and memory requests between the driver and executors.
Status Aggregation and Reflection: Included Lua scripts for statusAggregation and statusReflection to correctly aggregate and reflect key status fields of SparkApplication objects, such as sparkApplicationId, applicationState, completionTime, driverInfo, executorState, and LastSubmissionAttemptTime.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull Request Overview

This PR introduces a built-in resource interpreter for SparkApplication resources from the Spark Operator (v1beta2) to support better resource management in Karmada. The implementation provides customized interpretation logic for health status, resource requirements, replica counting, and status aggregation specific to Spark applications.

Adds comprehensive Lua scripts for interpreting SparkApplication health, replicas, and component resources
Implements status aggregation and reflection logic for SparkApplication status fields
Provides test data and test configuration to validate the interpreter functionality

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
customizations.yaml	Core interpreter configuration with Lua scripts for health, replicas, components, and status handling
customizations_tests.yaml	Test configuration defining operations to validate the interpreter
desired-sparkapplication.yaml	Test data representing the desired SparkApplication spec
observed-sparkapplication.yaml	Test data representing an observed SparkApplication with status
status-file.yaml	Test data containing status information for aggregation testing

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

gemini-code-assist · 2025-10-10T13:55:53Z

Summary of Changes

Hello @liaolecheng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates native support for SparkApplication resources into the system's resource interpretation framework. By providing a built-in interpreter, the platform gains enhanced capabilities to accurately assess the health, resource demands, and operational status of Spark applications, streamlining their management and deployment within the environment.

Highlights

Built-in Interpreter for SparkApplication: Introduced a new default resource interpreter for SparkApplication (sparkoperator.k8s.io/v1beta2) resources, enabling the system to understand and manage Spark workloads more effectively.
Lua Script for Health Interpretation: Added a Lua script for healthInterpretation that determines the health of a SparkApplication based on its applicationState.state, considering 'PENDING' and 'UNKNOWN' states as unhealthy.
Lua Script for Component Resource Interpretation: Implemented a Lua script for componentResource to extract detailed resource requests (CPU, memory, GPU) and pod template configurations (nodeSelector, tolerations, affinity, priorityClassName) for both the Spark driver and executor components.
Lua Script for Replica Resource Interpretation: Provided a Lua script for replicaResource that calculates the total number of replicas and aggregates resource requirements (CPU, memory) across the Spark driver and executors.
Lua Scripts for Status Aggregation and Reflection: Included Lua scripts for statusAggregation to combine status information from multiple sources and statusReflection to extract relevant status fields from an observed SparkApplication object.
Comprehensive Test Data: Added a suite of test YAML files (customizations_tests.yaml, desired-sparkapplication.yaml, observed-sparkapplication.yaml, status-file.yaml) to validate the functionality of the new SparkApplication resource interpreter across various operations.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2025-10-10T13:56:11Z

Summary of Changes

Hello @liaolecheng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates native support for SparkApplication resources into Karmada's resource interpretation framework. By defining a default ResourceInterpreterCustomization, it allows Karmada to effectively manage and orchestrate Spark workloads, including understanding their health, resource requirements, replica counts, and status across multiple clusters, thereby expanding Karmada's supported application types.

Highlights

SparkApplication Interpreter: Implemented a default resource interpreter for sparkoperator.k8s.io/v1beta2/SparkApplication resources within Karmada.
Comprehensive Interpretation Logic: Provided Lua scripts for healthInterpretation, componentResource, replicaResource, statusAggregation, and statusReflection to enable full lifecycle management of Spark applications.
Resource and Node Claim Extraction: The componentResource script now accurately extracts CPU, memory, GPU requests, and node scheduling constraints (nodeSelector, tolerations, affinity) for both driver and executor components.
Replica and Combined Resource Calculation: The replicaResource script calculates the total number of replicas and aggregates resource requests across driver and executors for the entire SparkApplication.
Status Synchronization: Included logic for reflecting and aggregating SparkApplication status fields, ensuring consistent state reporting across clusters.
Test Coverage: Added comprehensive test files (customizations_tests.yaml and associated test data) to validate the new interpreter's functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a default resource interpreter for SparkApplication resources, which is a great addition. The implementation covers health interpretation, component resource discovery, replica counting, status aggregation, and status reflection using Lua scripts. The overall approach is solid, but I've found several issues in the Lua scripts related to correctness and robustness, such as incorrect field names (camelCase vs. PascalCase), unsafe access to object properties, and use of global variables. I've left detailed comments with suggestions to address these points. Additionally, one of the test data files contains a field with an incorrect name, which should also be corrected.

...rty/resourcecustomizations/sparkoperator.k8s.io/v1beta2/SparkApplication/customizations.yaml

...ations/sparkoperator.k8s.io/v1beta2/SparkApplication/testdata/observed-sparkapplication.yaml

gemini-code-assist

Code Review

This pull request introduces a default resource interpreter for SparkApplication resources. The implementation is comprehensive, covering health interpretation, resource discovery for components, replica counting, status aggregation, and status reflection using Lua scripts. The changes are well-structured and include necessary test files. My review focuses on improving the Lua scripts for correctness and maintainability, as well as enhancing the test coverage and fixing minor issues in the test data. The most critical feedback is regarding the AggregateStatus function, which has flawed logic and uses global variables.

...rty/resourcecustomizations/sparkoperator.k8s.io/v1beta2/SparkApplication/customizations.yaml

...sourcecustomizations/sparkoperator.k8s.io/v1beta2/SparkApplication/testdata/status-file.yaml

...rty/resourcecustomizations/sparkoperator.k8s.io/v1beta2/SparkApplication/customizations.yaml

...sourcecustomizations/sparkoperator.k8s.io/v1beta2/SparkApplication/customizations_tests.yaml

...zations/sparkoperator.k8s.io/v1beta2/SparkApplication/testdata/desired-sparkapplication.yaml

...ations/sparkoperator.k8s.io/v1beta2/SparkApplication/testdata/observed-sparkapplication.yaml

gemini-code-assist

Code Review

This pull request introduces a resource interpreter for SparkApplication CRD. The implementation contains several bugs in the Lua scripts for resource and status interpretation, including incorrect field names, improper use of helper functions, lack of nil-safety checks, and flawed status aggregation logic. I've provided detailed comments and suggestions to address these critical issues.

...rty/resourcecustomizations/sparkoperator.k8s.io/v1beta2/SparkApplication/customizations.yaml

...ations/sparkoperator.k8s.io/v1beta2/SparkApplication/testdata/observed-sparkapplication.yaml

codecov-commenter · 2025-10-10T14:11:03Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.88%. Comparing base (070927d) to head (bf3dc32).
⚠️ Report is 10 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6818      +/-   ##
==========================================
+ Coverage   45.84%   45.88%   +0.03%     
==========================================
  Files         690      690              
  Lines       57300    57392      +92     
==========================================
+ Hits        26271    26333      +62     
- Misses      29399    29423      +24     
- Partials     1630     1636       +6

Flag	Coverage Δ
unittests	`45.88% <ø> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

RainbowMango

/assign

karmada-bot · 2025-10-11T02:09:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rainbowmango. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/resourceinterpreter/default/thirdparty/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

RainbowMango

I just looked at the InterpretHealth. Please cc me once it's ready for review, as I see you are updating this PR right now.

Please add a test report on how to test it.

RainbowMango · 2025-10-11T02:58:11Z

...rty/resourcecustomizations/sparkoperator.k8s.io/v1beta2/SparkApplication/customizations.yaml

+        function InterpretHealth(observedObj)
+          if observedObj and observedObj.status and observedObj.status.applicationState and observedObj.status.applicationState.state then
+            local state = observedObj.status.applicationState.state
+            return state ~= 'PENDING' and state ~= 'UNKNOWN'
+          end
+          return false
+        end


Suggested change

function InterpretHealth(observedObj)

if observedObj and observedObj.status and observedObj.status.applicationState and observedObj.status.applicationState.state then

local state = observedObj.status.applicationState.state

return state ~= 'PENDING' and state ~= 'UNKNOWN'

end

return false

end

function InterpretHealth(observedObj)

if not observedObj or

not observedObj.status or

not observedObj.status.applicationState or

not observedObj.status.applicationState.state then

return false

end

-- Only the 'FAILED' state is considered unhealthy. All other states are treated

-- as healthy or recoverable.

local state = observedObj.status.applicationState.state

if state == 'FAILED' then

return false

end

return true

liaolecheng · 2025-10-11T04:17:16Z

@RainbowMango I've revised the code, and it's now ready for review. I've been quite busy with other matters recently, but I’ll make sure to provide the test report (detailing how to test it) within the next couple of days. Thank you for your understanding and patience!

liaolecheng · 2025-10-12T14:27:30Z

Test Report

E2E Testing

Test Steps

Apply the SparkApplication CRD on the Karmada control plane.
Install SparkOperator on the member1 clusters respectively.
Submit the spark-pi.yaml file on the Karmada control plane. The file content is as follows:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "spark:3.5.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar"
  sparkVersion: "3.5.0"
  sparkUIOptions:
    serviceLabels:
      test-label/v1: 'true'
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 2
    memory: "512m"
    labels:
      version: 3.5.0
    serviceAccount: spark-operator-spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 2
    memory: "512m"
    labels:
      version: 3.5.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Submit the spark-pi-pp.yaml file on the Karmada control plane. The file content is as follows:

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: spark-pi-pp
  namespace: default 
spec:
  resourceSelectors:
    - apiVersion: sparkoperator.k8s.io/v1beta2 
      kind: SparkApplication                    
      name: spark-pi                             
  placement:
    clusterAffinity:
      clusterNames:
        - member1

Test Result

Check the to verify the correctness of multi-component resource parsing.

Confirm that the aggregatedStatus correctly summarizes the health of resources across member clusters:

Check the spark-pi application on the Karmada control plane and member cluster respectively to verify the correctness of status parsing.

Unit Testing

Test Steps

Modify testdata/desired-sparkapplication.yaml to add GPU resources:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "spark:3.5.0"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar"
  sparkVersion: "3.5.0"
  sparkUIOptions:
    serviceLabels:
      test-label/v1: 'true'
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 2
    memory: "512m"
    gpu:
      name: "amd.com/gpu"
      quantity: 1 
    labels:
      version: 3.5.0
    serviceAccount: spark-operator-spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 2
    memory: "512m"
    gpu:
      name: "nvidia.com/gpu"
      quantity: 1
    labels:
      version: 3.5.0
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Use the following command to verify if GPU resources can be correctly interpreted:

karmadactl interpret -f customizations.yaml \
  ----desired-file testdata/desired-sparkapplication.yaml \
  --operation InterpretComponent

Test Result

…zations for SparkApplication Signed-off-by: liaolecheng <[email protected]>

Copilot AI review requested due to automatic review settings October 10, 2025 13:54

karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 10, 2025

karmada-bot requested review from Poor12 and mszacillo October 10, 2025 13:54

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 10, 2025

Copilot AI reviewed Oct 10, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 10, 2025

View reviewed changes

liaolecheng force-pushed the spark branch 2 times, most recently from 4f95e73 to 9402ef8 Compare October 10, 2025 14:32

RainbowMango reviewed Oct 11, 2025

View reviewed changes

karmada-bot assigned RainbowMango Oct 11, 2025

RainbowMango added this to the v1.16 milestone Oct 11, 2025

liaolecheng force-pushed the spark branch from 9402ef8 to 8c1bf57 Compare October 11, 2025 02:09

liaolecheng force-pushed the spark branch 3 times, most recently from ac4c55d to 857206a Compare October 11, 2025 02:28

RainbowMango reviewed Oct 11, 2025

View reviewed changes

liaolecheng force-pushed the spark branch 2 times, most recently from 13ad9f5 to 7593da0 Compare October 11, 2025 04:08

Implement default resource interpreter in third-party resourcecustomi…

bf3dc32

…zations for SparkApplication Signed-off-by: liaolecheng <[email protected]>

liaolecheng force-pushed the spark branch from 7593da0 to bf3dc32 Compare October 12, 2025 14:31

Implement default resource interpreter in third-party resourcecustomizations for SparkApplication #6818

Are you sure you want to change the base?

Implement default resource interpreter in third-party resourcecustomizations for SparkApplication #6818

Uh oh!

Conversation

liaolecheng commented Oct 10, 2025

Uh oh!

gemini-code-assist bot commented Oct 10, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

gemini-code-assist bot commented Oct 10, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot commented Oct 10, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

RainbowMango left a comment

Choose a reason for hiding this comment

Uh oh!

karmada-bot commented Oct 11, 2025

Uh oh!

RainbowMango left a comment

Choose a reason for hiding this comment

Uh oh!

RainbowMango Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

liaolecheng commented Oct 11, 2025

Uh oh!

liaolecheng commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Report

E2E Testing

Test Steps

codecov-commenter commented Oct 10, 2025 •

edited

Loading

liaolecheng commented Oct 12, 2025 •

edited

Loading