Skip to content

Commit 37b7d32

Browse files
shanyutgravescs
authored andcommitted
[SPARK-30845] Do not upload local pyspark archives for spark-submit on Yarn
### What changes were proposed in this pull request? Use spark-submit to submit a pyspark app on Yarn, and set this in spark-env.sh: export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip You can see that these local archives are still uploaded to Yarn distributed cache: yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://myhdfs/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip This PR fix this issue by checking the files specified in PYSPARK_ARCHIVES_PATH, if they are local archives, don't distribute to Yarn dist cache. ### Why are the changes needed? For pyspark appp to support local pyspark archives set in PYSPARK_ARCHIVES_PATH. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests and manual tests. Closes apache#27598 from shanyu/shanyu-30845. Authored-by: Shanyu Zhao <[email protected]> Signed-off-by: Thomas Graves <[email protected]>
1 parent b333ed0 commit 37b7d32

File tree

1 file changed

+6
-1
lines changed
  • resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn

1 file changed

+6
-1
lines changed

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -635,7 +635,12 @@ private[spark] class Client(
635635
distribute(args.primaryPyFile, appMasterOnly = true)
636636
}
637637

638-
pySparkArchives.foreach { f => distribute(f) }
638+
pySparkArchives.foreach { f =>
639+
val uri = Utils.resolveURI(f)
640+
if (uri.getScheme != Utils.LOCAL_SCHEME) {
641+
distribute(f)
642+
}
643+
}
639644

640645
// The python files list needs to be treated especially. All files that are not an
641646
// archive need to be placed in a subdirectory that will be added to PYTHONPATH.

0 commit comments

Comments
 (0)