Skip to content

Commit

Permalink
[SPARK-45969][DOCS] Document configuration change of executor failure…
Browse files Browse the repository at this point in the history
… tracker

### What changes were proposed in this pull request?

It's a follow-up of SPARK-41210 (use a new JIRA ticket because it was released in 3.5.0), this PR updates docs/migration guide about configuration change of executor failure tracker

### Why are the changes needed?

Docs update is missing in previous changes, also is requested apache@40872e9#r132516892 by tgravescs

### Does this PR introduce _any_ user-facing change?

Yes, docs changed

### How was this patch tested?

Review

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#43863 from pan3793/SPARK-45969.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
pan3793 authored and dongjoon-hyun committed Dec 10, 2023
1 parent a777130 commit 7a43de1
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 19 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -931,7 +931,7 @@ package object config {

private[spark] val MAX_EXECUTOR_FAILURES =
ConfigBuilder("spark.executor.maxNumFailures")
.doc("Spark exits if the number of failed executors exceeds this threshold. " +
.doc("The maximum number of executor failures before failing the application. " +
"This configuration only takes effect on YARN, or Kubernetes when " +
"`spark.kubernetes.allocation.pods.allocator` is set to 'direct'.")
.version("3.5.0")
Expand All @@ -940,7 +940,7 @@ package object config {

private[spark] val EXECUTOR_ATTEMPT_FAILURE_VALIDITY_INTERVAL_MS =
ConfigBuilder("spark.executor.failuresValidityInterval")
.doc("Interval after which Executor failures will be considered independent and not " +
.doc("Interval after which executor failures will be considered independent and not " +
"accumulate towards the attempt count. This configuration only takes effect on YARN, " +
"or Kubernetes when `spark.kubernetes.allocation.pods.allocator` is set to 'direct'.")
.version("3.5.0")
Expand Down
21 changes: 21 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -522,6 +522,27 @@ of the most common options to set are:
</td>
<td>3.2.0</td>
</tr>
<tr>
<td><code>spark.executor.maxNumFailures</code></td>
<td>numExecutors * 2, with minimum of 3</td>
<td>
The maximum number of executor failures before failing the application.
This configuration only takes effect on YARN, or Kubernetes when
`spark.kubernetes.allocation.pods.allocator` is set to 'direct'.
</td>
<td>3.5.0</td>
</tr>
<tr>
<td><code>spark.executor.failuresValidityInterval</code></td>
<td>(none)</td>
<td>
Interval after which executor failures will be considered independent and
not accumulate towards the attempt count.
This configuration only takes effect on YARN, or Kubernetes when
`spark.kubernetes.allocation.pods.allocator` is set to 'direct'.
</td>
<td>3.5.0</td>
</tr>
</table>

Apart from these, the following properties are also available, and may be useful in some situations:
Expand Down
6 changes: 6 additions & 0 deletions docs/core-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ license: |

- In Spark 4.0, support for Apache Mesos as a resource manager was removed.

## Upgrading from Core 3.4 to 3.5

- Since Spark 3.5, `spark.yarn.executor.failuresValidityInterval` is deprecated. Use `spark.executor.failuresValidityInterval` instead.

- Since Spark 3.5, `spark.yarn.max.executor.failures` is deprecated. Use `spark.executor.maxNumFailures` instead.

## Upgrading from Core 3.3 to 3.4

- Since Spark 3.4, Spark driver will own `PersistentVolumnClaim`s and try to reuse if they are not assigned to live executors. To restore the behavior before Spark 3.4, you can set `spark.kubernetes.driver.ownPersistentVolumeClaim` to `false` and `spark.kubernetes.driver.reusePersistentVolumeClaim` to `false`.
Expand Down
17 changes: 0 additions & 17 deletions docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -291,14 +291,6 @@ To use a custom metrics.properties for the application master and executors, upd
</td>
<td>1.4.0</td>
</tr>
<tr>
<td><code>spark.yarn.max.executor.failures</code></td>
<td>numExecutors * 2, with minimum of 3</td>
<td>
The maximum number of executor failures before failing the application.
</td>
<td>1.0.0</td>
</tr>
<tr>
<td><code>spark.yarn.historyServer.address</code></td>
<td>(none)</td>
Expand Down Expand Up @@ -499,15 +491,6 @@ To use a custom metrics.properties for the application master and executors, upd
</td>
<td>3.3.0</td>
</tr>
<tr>
<td><code>spark.yarn.executor.failuresValidityInterval</code></td>
<td>(none)</td>
<td>
Defines the validity interval for executor failure tracking.
Executor failures which are older than the validity interval will be ignored.
</td>
<td>2.0.0</td>
</tr>
<tr>
<td><code>spark.yarn.submit.waitAppCompletion</code></td>
<td><code>true</code></td>
Expand Down

0 comments on commit 7a43de1

Please sign in to comment.