Skip to content

Commit cf22d94

Browse files
xkrogentgravescs
authored andcommitted
[SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature
### What changes were proposed in this pull request? This PR will remove references to these "blacklist" and "whitelist" terms besides the blacklisting feature as a whole, which can be handled in a separate JIRA/PR. This touches quite a few files, but the changes are straightforward (variable/method/etc. name changes) and most quite self-contained. ### Why are the changes needed? As per discussion on the Spark dev list, it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world. ### Does this PR introduce _any_ user-facing change? In the test file `HiveQueryFileTest`, a developer has the ability to specify the system property `spark.hive.whitelist` to specify a list of Hive query files that should be tested. This system property has been renamed to `spark.hive.includelist`. The old property has been kept for compatibility, but will log a warning if used. I am open to feedback from others on whether keeping a deprecated property here is unnecessary given that this is just for developers running tests. ### How was this patch tested? Existing tests should be suitable since no behavior changes are expected as a result of this PR. Closes apache#28874 from xkrogen/xkrogen-SPARK-32036-rename-blacklists. Authored-by: Erik Krogen <[email protected]> Signed-off-by: Thomas Graves <[email protected]>
1 parent 8950dcb commit cf22d94

File tree

52 files changed

+231
-219
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+231
-219
lines changed

R/pkg/tests/fulltests/test_context.R

+1-1
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ test_that("utility function can be called", {
139139
expect_true(TRUE)
140140
})
141141

142-
test_that("getClientModeSparkSubmitOpts() returns spark-submit args from whitelist", {
142+
test_that("getClientModeSparkSubmitOpts() returns spark-submit args from allowList", {
143143
e <- new.env()
144144
e[["spark.driver.memory"]] <- "512m"
145145
ops <- getClientModeSparkSubmitOpts("sparkrmain", e)

R/pkg/tests/fulltests/test_sparkSQL.R

+4-4
Original file line numberDiff line numberDiff line change
@@ -3921,14 +3921,14 @@ test_that("No extra files are created in SPARK_HOME by starting session and maki
39213921
# before creating a SparkSession with enableHiveSupport = T at the top of this test file
39223922
# (filesBefore). The test here is to compare that (filesBefore) against the list of files before
39233923
# any test is run in run-all.R (sparkRFilesBefore).
3924-
# sparkRWhitelistSQLDirs is also defined in run-all.R, and should contain only 2 whitelisted dirs,
3924+
# sparkRAllowedSQLDirs is also defined in run-all.R, and should contain only 2 allowed dirs,
39253925
# here allow the first value, spark-warehouse, in the diff, everything else should be exactly the
39263926
# same as before any test is run.
3927-
compare_list(sparkRFilesBefore, setdiff(filesBefore, sparkRWhitelistSQLDirs[[1]]))
3927+
compare_list(sparkRFilesBefore, setdiff(filesBefore, sparkRAllowedSQLDirs[[1]]))
39283928
# third, ensure only spark-warehouse and metastore_db are created when enableHiveSupport = T
39293929
# note: as the note above, after running all tests in this file while enableHiveSupport = T, we
3930-
# check the list of files again. This time we allow both whitelisted dirs to be in the diff.
3931-
compare_list(sparkRFilesBefore, setdiff(filesAfter, sparkRWhitelistSQLDirs))
3930+
# check the list of files again. This time we allow both dirs to be in the diff.
3931+
compare_list(sparkRFilesBefore, setdiff(filesAfter, sparkRAllowedSQLDirs))
39323932
})
39333933

39343934
unlink(parquetPath)

R/pkg/tests/run-all.R

+2-2
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@ if (identical(Sys.getenv("NOT_CRAN"), "true")) {
3535
install.spark(overwrite = TRUE)
3636

3737
sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R")
38-
sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db")
39-
invisible(lapply(sparkRWhitelistSQLDirs,
38+
sparkRAllowedSQLDirs <- c("spark-warehouse", "metastore_db")
39+
invisible(lapply(sparkRAllowedSQLDirs,
4040
function(x) { unlink(file.path(sparkRDir, x), recursive = TRUE, force = TRUE)}))
4141
sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE)
4242

common/network-common/src/main/java/org/apache/spark/network/crypto/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -155,4 +155,4 @@ server will be able to understand. This will cause the server to close the conne
155155
attacker tries to send any command to the server. The attacker can just hold the channel open for
156156
some time, which will be closed when the server times out the channel. These issues could be
157157
separately mitigated by adding a shorter timeout for the first message after authentication, and
158-
potentially by adding host blacklists if a possible attack is detected from a particular host.
158+
potentially by adding host reject-lists if a possible attack is detected from a particular host.

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+15-14
Original file line numberDiff line numberDiff line change
@@ -188,23 +188,24 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
188188
processing.remove(path.getName)
189189
}
190190

191-
private val blacklist = new ConcurrentHashMap[String, Long]
191+
private val inaccessibleList = new ConcurrentHashMap[String, Long]
192192

193193
// Visible for testing
194-
private[history] def isBlacklisted(path: Path): Boolean = {
195-
blacklist.containsKey(path.getName)
194+
private[history] def isAccessible(path: Path): Boolean = {
195+
!inaccessibleList.containsKey(path.getName)
196196
}
197197

198-
private def blacklist(path: Path): Unit = {
199-
blacklist.put(path.getName, clock.getTimeMillis())
198+
private def markInaccessible(path: Path): Unit = {
199+
inaccessibleList.put(path.getName, clock.getTimeMillis())
200200
}
201201

202202
/**
203-
* Removes expired entries in the blacklist, according to the provided `expireTimeInSeconds`.
203+
* Removes expired entries in the inaccessibleList, according to the provided
204+
* `expireTimeInSeconds`.
204205
*/
205-
private def clearBlacklist(expireTimeInSeconds: Long): Unit = {
206+
private def clearInaccessibleList(expireTimeInSeconds: Long): Unit = {
206207
val expiredThreshold = clock.getTimeMillis() - expireTimeInSeconds * 1000
207-
blacklist.asScala.retain((_, creationTime) => creationTime >= expiredThreshold)
208+
inaccessibleList.asScala.retain((_, creationTime) => creationTime >= expiredThreshold)
208209
}
209210

210211
private val activeUIs = new mutable.HashMap[(String, Option[String]), LoadedAppUI]()
@@ -470,7 +471,7 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
470471
logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
471472

472473
val updated = Option(fs.listStatus(new Path(logDir))).map(_.toSeq).getOrElse(Nil)
473-
.filter { entry => !isBlacklisted(entry.getPath) }
474+
.filter { entry => isAccessible(entry.getPath) }
474475
.filter { entry => !isProcessing(entry.getPath) }
475476
.flatMap { entry => EventLogFileReader(fs, entry) }
476477
.filter { reader =>
@@ -687,8 +688,8 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
687688
case e: AccessControlException =>
688689
// We don't have read permissions on the log file
689690
logWarning(s"Unable to read log $rootPath", e)
690-
blacklist(rootPath)
691-
// SPARK-28157 We should remove this blacklisted entry from the KVStore
691+
markInaccessible(rootPath)
692+
// SPARK-28157 We should remove this inaccessible entry from the KVStore
692693
// to handle permission-only changes with the same file sizes later.
693694
listing.delete(classOf[LogInfo], rootPath.toString)
694695
case e: Exception =>
@@ -956,8 +957,8 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
956957
}
957958
}
958959

959-
// Clean the blacklist from the expired entries.
960-
clearBlacklist(CLEAN_INTERVAL_S)
960+
// Clean the inaccessibleList from the expired entries.
961+
clearInaccessibleList(CLEAN_INTERVAL_S)
961962
}
962963

963964
private def deleteAttemptLogs(
@@ -1334,7 +1335,7 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
13341335

13351336
private def deleteLog(fs: FileSystem, log: Path): Boolean = {
13361337
var deleted = false
1337-
if (isBlacklisted(log)) {
1338+
if (!isAccessible(log)) {
13381339
logDebug(s"Skipping deleting $log as we don't have permissions on it.")
13391340
} else {
13401341
try {

core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala

+2-2
Original file line numberDiff line numberDiff line change
@@ -411,7 +411,7 @@ private[spark] object RestSubmissionClient {
411411

412412
// SPARK_HOME and SPARK_CONF_DIR are filtered out because they are usually wrong
413413
// on the remote machine (SPARK-12345) (SPARK-25934)
414-
private val BLACKLISTED_SPARK_ENV_VARS = Set("SPARK_ENV_LOADED", "SPARK_HOME", "SPARK_CONF_DIR")
414+
private val EXCLUDED_SPARK_ENV_VARS = Set("SPARK_ENV_LOADED", "SPARK_HOME", "SPARK_CONF_DIR")
415415
private val REPORT_DRIVER_STATUS_INTERVAL = 1000
416416
private val REPORT_DRIVER_STATUS_MAX_TRIES = 10
417417
val PROTOCOL_VERSION = "v1"
@@ -421,7 +421,7 @@ private[spark] object RestSubmissionClient {
421421
*/
422422
private[rest] def filterSystemEnvironment(env: Map[String, String]): Map[String, String] = {
423423
env.filterKeys { k =>
424-
(k.startsWith("SPARK_") && !BLACKLISTED_SPARK_ENV_VARS.contains(k)) || k.startsWith("MESOS_")
424+
(k.startsWith("SPARK_") && !EXCLUDED_SPARK_ENV_VARS.contains(k)) || k.startsWith("MESOS_")
425425
}.toMap
426426
}
427427

core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ private[spark] class OutputCommitCoordinator(conf: SparkConf, isDriver: Boolean)
151151
logInfo(s"Task was denied committing, stage: $stage.$stageAttempt, " +
152152
s"partition: $partition, attempt: $attemptNumber")
153153
case _ =>
154-
// Mark the attempt as failed to blacklist from future commit protocol
154+
// Mark the attempt as failed to exclude from future commit protocol
155155
val taskId = TaskIdentifier(stageAttempt, attemptNumber)
156156
stageState.failures.getOrElseUpdate(partition, mutable.Set()) += taskId
157157
if (stageState.authorizedCommitters(partition) == taskId) {

core/src/main/scala/org/apache/spark/util/JsonProtocol.scala

+2-2
Original file line numberDiff line numberDiff line change
@@ -328,11 +328,11 @@ private[spark] object JsonProtocol {
328328
("Accumulables" -> accumulablesToJson(taskInfo.accumulables))
329329
}
330330

331-
private lazy val accumulableBlacklist = Set("internal.metrics.updatedBlockStatuses")
331+
private lazy val accumulableExcludeList = Set("internal.metrics.updatedBlockStatuses")
332332

333333
def accumulablesToJson(accumulables: Iterable[AccumulableInfo]): JArray = {
334334
JArray(accumulables
335-
.filterNot(_.name.exists(accumulableBlacklist.contains))
335+
.filterNot(_.name.exists(accumulableExcludeList.contains))
336336
.toList.map(accumulableInfoToJson))
337337
}
338338

core/src/test/scala/org/apache/spark/ThreadAudit.scala

+2-2
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ import org.apache.spark.internal.Logging
2626
*/
2727
trait ThreadAudit extends Logging {
2828

29-
val threadWhiteList = Set(
29+
val threadExcludeList = Set(
3030
/**
3131
* Netty related internal threads.
3232
* These are excluded because their lifecycle is handled by the netty itself
@@ -108,7 +108,7 @@ trait ThreadAudit extends Logging {
108108

109109
if (threadNamesSnapshot.nonEmpty) {
110110
val remainingThreadNames = runningThreadNames().diff(threadNamesSnapshot)
111-
.filterNot { s => threadWhiteList.exists(s.matches(_)) }
111+
.filterNot { s => threadExcludeList.exists(s.matches(_)) }
112112
if (remainingThreadNames.nonEmpty) {
113113
logWarning(s"\n\n===== POSSIBLE THREAD LEAK IN SUITE $shortSuiteName, " +
114114
s"thread names: ${remainingThreadNames.mkString(", ")} =====\n")

core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

+11-11
Original file line numberDiff line numberDiff line change
@@ -1210,17 +1210,17 @@ class SparkSubmitSuite
12101210
testRemoteResources(enableHttpFs = true)
12111211
}
12121212

1213-
test("force download from blacklisted schemes") {
1214-
testRemoteResources(enableHttpFs = true, blacklistSchemes = Seq("http"))
1213+
test("force download from forced schemes") {
1214+
testRemoteResources(enableHttpFs = true, forceDownloadSchemes = Seq("http"))
12151215
}
12161216

12171217
test("force download for all the schemes") {
1218-
testRemoteResources(enableHttpFs = true, blacklistSchemes = Seq("*"))
1218+
testRemoteResources(enableHttpFs = true, forceDownloadSchemes = Seq("*"))
12191219
}
12201220

12211221
private def testRemoteResources(
12221222
enableHttpFs: Boolean,
1223-
blacklistSchemes: Seq[String] = Nil): Unit = {
1223+
forceDownloadSchemes: Seq[String] = Nil): Unit = {
12241224
val hadoopConf = new Configuration()
12251225
updateConfWithFakeS3Fs(hadoopConf)
12261226
if (enableHttpFs) {
@@ -1237,8 +1237,8 @@ class SparkSubmitSuite
12371237
val tmpHttpJar = TestUtils.createJarWithFiles(Map("test.resource" -> "USER"), tmpDir)
12381238
val tmpHttpJarPath = s"http://${new File(tmpHttpJar.toURI).getAbsolutePath}"
12391239

1240-
val forceDownloadArgs = if (blacklistSchemes.nonEmpty) {
1241-
Seq("--conf", s"spark.yarn.dist.forceDownloadSchemes=${blacklistSchemes.mkString(",")}")
1240+
val forceDownloadArgs = if (forceDownloadSchemes.nonEmpty) {
1241+
Seq("--conf", s"spark.yarn.dist.forceDownloadSchemes=${forceDownloadSchemes.mkString(",")}")
12421242
} else {
12431243
Nil
12441244
}
@@ -1256,19 +1256,19 @@ class SparkSubmitSuite
12561256

12571257
val jars = conf.get("spark.yarn.dist.jars").split(",").toSet
12581258

1259-
def isSchemeBlacklisted(scheme: String) = {
1260-
blacklistSchemes.contains("*") || blacklistSchemes.contains(scheme)
1259+
def isSchemeForcedDownload(scheme: String) = {
1260+
forceDownloadSchemes.contains("*") || forceDownloadSchemes.contains(scheme)
12611261
}
12621262

1263-
if (!isSchemeBlacklisted("s3")) {
1263+
if (!isSchemeForcedDownload("s3")) {
12641264
assert(jars.contains(tmpS3JarPath))
12651265
}
12661266

1267-
if (enableHttpFs && blacklistSchemes.isEmpty) {
1267+
if (enableHttpFs && forceDownloadSchemes.isEmpty) {
12681268
// If Http FS is supported by yarn service, the URI of remote http resource should
12691269
// still be remote.
12701270
assert(jars.contains(tmpHttpJarPath))
1271-
} else if (!enableHttpFs || isSchemeBlacklisted("http")) {
1271+
} else if (!enableHttpFs || isSchemeForcedDownload("http")) {
12721272
// If Http FS is not supported by yarn service, or http scheme is configured to be force
12731273
// downloading, the URI of remote http resource should be changed to a local one.
12741274
val jarName = new File(tmpHttpJar.toURI).getName

core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala

+4-4
Original file line numberDiff line numberDiff line change
@@ -1117,7 +1117,7 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging {
11171117
}
11181118
}
11191119

1120-
test("SPARK-24948: blacklist files we don't have read permission on") {
1120+
test("SPARK-24948: ignore files we don't have read permission on") {
11211121
val clock = new ManualClock(1533132471)
11221122
val provider = new FsHistoryProvider(createTestConf(), clock)
11231123
val accessDenied = newLogFile("accessDenied", None, inProgress = false)
@@ -1137,17 +1137,17 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging {
11371137
updateAndCheck(mockedProvider) { list =>
11381138
list.size should be(1)
11391139
}
1140-
// Doing 2 times in order to check the blacklist filter too
1140+
// Doing 2 times in order to check the inaccessibleList filter too
11411141
updateAndCheck(mockedProvider) { list =>
11421142
list.size should be(1)
11431143
}
11441144
val accessDeniedPath = new Path(accessDenied.getPath)
1145-
assert(mockedProvider.isBlacklisted(accessDeniedPath))
1145+
assert(!mockedProvider.isAccessible(accessDeniedPath))
11461146
clock.advance(24 * 60 * 60 * 1000 + 1) // add a bit more than 1d
11471147
isReadable = true
11481148
mockedProvider.cleanLogs()
11491149
updateAndCheck(mockedProvider) { list =>
1150-
assert(!mockedProvider.isBlacklisted(accessDeniedPath))
1150+
assert(mockedProvider.isAccessible(accessDeniedPath))
11511151
assert(list.exists(_.name == "accessDenied"))
11521152
assert(list.exists(_.name == "accessGranted"))
11531153
list.size should be(2)

core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala

+9-5
Original file line numberDiff line numberDiff line change
@@ -48,24 +48,28 @@ import org.apache.spark.util.CallSite
4848

4949
private[spark] class SparkUICssErrorHandler extends DefaultCssErrorHandler {
5050

51-
private val cssWhiteList = List("bootstrap.min.css", "vis-timeline-graph2d.min.css")
51+
/**
52+
* Some libraries have warn/error messages that are too noisy for the tests; exclude them from
53+
* normal error handling to avoid logging these.
54+
*/
55+
private val cssExcludeList = List("bootstrap.min.css", "vis-timeline-graph2d.min.css")
5256

53-
private def isInWhileList(uri: String): Boolean = cssWhiteList.exists(uri.endsWith)
57+
private def isInExcludeList(uri: String): Boolean = cssExcludeList.exists(uri.endsWith)
5458

5559
override def warning(e: CSSParseException): Unit = {
56-
if (!isInWhileList(e.getURI)) {
60+
if (!isInExcludeList(e.getURI)) {
5761
super.warning(e)
5862
}
5963
}
6064

6165
override def fatalError(e: CSSParseException): Unit = {
62-
if (!isInWhileList(e.getURI)) {
66+
if (!isInExcludeList(e.getURI)) {
6367
super.fatalError(e)
6468
}
6569
}
6670

6771
override def error(e: CSSParseException): Unit = {
68-
if (!isInWhileList(e.getURI)) {
72+
if (!isInExcludeList(e.getURI)) {
6973
super.error(e)
7074
}
7175
}

dev/sparktestsupport/modules.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ class Module(object):
3232
"""
3333

3434
def __init__(self, name, dependencies, source_file_regexes, build_profile_flags=(), environ={},
35-
sbt_test_goals=(), python_test_goals=(), blacklisted_python_implementations=(),
35+
sbt_test_goals=(), python_test_goals=(), excluded_python_implementations=(),
3636
test_tags=(), should_run_r_tests=False, should_run_build_tests=False):
3737
"""
3838
Define a new module.
@@ -49,7 +49,7 @@ def __init__(self, name, dependencies, source_file_regexes, build_profile_flags=
4949
module are changed.
5050
:param sbt_test_goals: A set of SBT test goals for testing this module.
5151
:param python_test_goals: A set of Python test goals for testing this module.
52-
:param blacklisted_python_implementations: A set of Python implementations that are not
52+
:param excluded_python_implementations: A set of Python implementations that are not
5353
supported by this module's Python components. The values in this set should match
5454
strings returned by Python's `platform.python_implementation()`.
5555
:param test_tags A set of tags that will be excluded when running unit tests if the module
@@ -64,7 +64,7 @@ def __init__(self, name, dependencies, source_file_regexes, build_profile_flags=
6464
self.build_profile_flags = build_profile_flags
6565
self.environ = environ
6666
self.python_test_goals = python_test_goals
67-
self.blacklisted_python_implementations = blacklisted_python_implementations
67+
self.excluded_python_implementations = excluded_python_implementations
6868
self.test_tags = test_tags
6969
self.should_run_r_tests = should_run_r_tests
7070
self.should_run_build_tests = should_run_build_tests
@@ -524,7 +524,7 @@ def __hash__(self):
524524
"pyspark.mllib.tests.test_streaming_algorithms",
525525
"pyspark.mllib.tests.test_util",
526526
],
527-
blacklisted_python_implementations=[
527+
excluded_python_implementations=[
528528
"PyPy" # Skip these tests under PyPy since they require numpy and it isn't available there
529529
]
530530
)
@@ -565,7 +565,7 @@ def __hash__(self):
565565
"pyspark.ml.tests.test_tuning",
566566
"pyspark.ml.tests.test_wrapper",
567567
],
568-
blacklisted_python_implementations=[
568+
excluded_python_implementations=[
569569
"PyPy" # Skip these tests under PyPy since they require numpy and it isn't available there
570570
]
571571
)

0 commit comments

Comments
 (0)