Skip to content

Commit

Permalink
[vpj] Exclude Spark's transitive dependencies that are known to be bad (
Browse files Browse the repository at this point in the history
#871)

In the PR that added Apache Spark, a few dependencies were bumped transitively and some of them are known to have critical issues:
1. Avro 1.11 is known to be susceptible to a deadlock bug (AVRO-3243)
2. log4j 2.17.2 has a performance regression (LOG4J2-3487)
  • Loading branch information
nisargthakkar authored Feb 22, 2024
1 parent 85cb304 commit c64d119
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 0 deletions.
1 change: 1 addition & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ ext.libraries = [
kafkaClientsTest: "${kafkaGroup}:kafka-clients:${kafkaVersion}:test",
log4j2api: "org.apache.logging.log4j:log4j-api:${log4j2Version}",
log4j2core: "org.apache.logging.log4j:log4j-core:${log4j2Version}",
log4j2Slf4j: "org.apache.logging.log4j:log4j-slf4j-impl:${log4j2Version}",
mail: 'javax.mail:mail:1.4.4',
mapreduceClientCore: "org.apache.hadoop:hadoop-mapreduce-client-core:${hadoopVersion}",
mapreduceClientJobClient: "org.apache.hadoop:hadoop-mapreduce-client-jobclient:${hadoopVersion}",
Expand Down
21 changes: 21 additions & 0 deletions clients/venice-push-job/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ dependencies {
// Spark 3.3 depends on hadoop-client-runtime and hadoop-client-api, which are shaded jars that were added in Hadoop 3.0.3
exclude group: 'org.apache.hadoop', module: 'hadoop-client-runtime'
exclude group: 'org.apache.hadoop', module: 'hadoop-client-api'

// Spark 3.3 depends on Avro 1.11 which is known to be susceptible to a deadlock bug (AVRO-3243)
exclude group: 'org.apache.avro'

// Spark 3.3 depends on log4j 2.17.2 which has a performance regression (LOG4J2-3487)
exclude group: 'org.apache.logging.log4j'
}
implementation (libraries.apacheSparkCore) {
// Spark 3.1 depends on Avro 1.8.2 - which uses avro-mapred with the hadoop2 classifier. Starting from Avro 1.9
Expand All @@ -44,6 +50,12 @@ dependencies {
// Spark 3.3 depends on hadoop-client-runtime and hadoop-client-api, which are shaded jars that were added in Hadoop 3.0.3
exclude group: 'org.apache.hadoop', module: 'hadoop-client-runtime'
exclude group: 'org.apache.hadoop', module: 'hadoop-client-api'

// Spark 3.3 depends on Avro 1.11 which is known to be susceptible to a deadlock bug (AVRO-3243)
exclude group: 'org.apache.avro'

// Spark 3.3 depends on log4j 2.17.2 which has a performance regression (LOG4J2-3487)
exclude group: 'org.apache.logging.log4j'
}
implementation (libraries.apacheSparkSql) {
// Spark 3.1 depends on Avro 1.8.2 - which uses avro-mapred with the hadoop2 classifier. Starting from Avro 1.9
Expand All @@ -53,13 +65,22 @@ dependencies {
// Spark 3.3 depends on hadoop-client-runtime and hadoop-client-api, which are shaded jars that were added in Hadoop 3.0.3
exclude group: 'org.apache.hadoop', module: 'hadoop-client-runtime'
exclude group: 'org.apache.hadoop', module: 'hadoop-client-api'

// Spark 3.3 depends on Avro 1.11 which is known to be susceptible to a deadlock bug (AVRO-3243)
exclude group: 'org.apache.avro'

// Spark 3.3 depends on log4j 2.17.2 which has a performance regression (LOG4J2-3487)
exclude group: 'org.apache.logging.log4j'
}

// Spark versions 3.2.X - 3.3.X are compiled with antlr4 4.8. In our classpath, antlr4 version 4.5 is used. This
// discrepancy causes errors at runtime.
implementation libraries.antlr4
implementation libraries.antlr4Runtime

// Spark needs log4j-slf4j-impl that got excluded via exclude group: 'org.apache.logging.log4j'
implementation libraries.log4j2Slf4j

implementation project(':clients:venice-thin-client') // Needed by the KME SchemaReader

implementation libraries.commonsIo
Expand Down

0 comments on commit c64d119

Please sign in to comment.