You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
### What changes were proposed in this pull request?
Use spark-submit to submit a pyspark app on Yarn, and set this in spark-env.sh:
export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
You can see that these local archives are still uploaded to Yarn distributed cache:
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://myhdfs/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
This PR fix this issue by checking the files specified in PYSPARK_ARCHIVES_PATH, if they are local archives, don't distribute to Yarn dist cache.
### Why are the changes needed?
For pyspark appp to support local pyspark archives set in PYSPARK_ARCHIVES_PATH.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Existing tests and manual tests.
Closesapache#27598 from shanyu/shanyu-30845.
Authored-by: Shanyu Zhao <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>
0 commit comments