Commit 37b7d32
[SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
### What changes were proposed in this pull request?
Use spark-submit to submit a pyspark app on Yarn, and set this in spark-env.sh:
export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
You can see that these local archives are still uploaded to Yarn distributed cache:
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://myhdfs/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
This PR fix this issue by checking the files specified in PYSPARK_ARCHIVES_PATH, if they are local archives, don't distribute to Yarn dist cache.
### Why are the changes needed?
For pyspark appp to support local pyspark archives set in PYSPARK_ARCHIVES_PATH.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Existing tests and manual tests.
Closes apache#27598 from shanyu/shanyu-30845.
Authored-by: Shanyu Zhao <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>1 parent b333ed0 commit 37b7d32
File tree
1 file changed
+6
-1
lines changed- resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn
1 file changed
+6
-1
lines changedLines changed: 6 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
635 | 635 | | |
636 | 636 | | |
637 | 637 | | |
638 | | - | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
639 | 644 | | |
640 | 645 | | |
641 | 646 | | |
| |||
0 commit comments