Skip to content

Commit 4fc8ee7

Browse files
belieferHyukjinKwon
authored andcommitted
[SPARK-31295][DOC] Supplement version for configuration appear in doc
### What changes were proposed in this pull request? This PR supplements version for configuration appear in docs. I sorted out some information show below. **docs/spark-standalone.md** Item name | Since version | JIRA ID | Commit ID | Note -- | -- | -- | -- | -- spark.deploy.retainedApplications | 0.8.0 | None | 46eecd1#diff-29dffdccd5a7f4c8b496c293e87c8668 |   spark.deploy.retainedDrivers | 1.1.0 | None | 7446f5f#diff-29dffdccd5a7f4c8b496c293e87c8668 |   spark.deploy.spreadOut | 0.6.1 | None | bb2b9ff#diff-0e7ae91819fc8f7b47b0f97be7116325 |   spark.deploy.defaultCores | 0.9.0 | None | d8bcc8e#diff-29dffdccd5a7f4c8b496c293e87c8668 |   spark.deploy.maxExecutorRetries | 1.6.3 | SPARK-16956 | ace458f#diff-29dffdccd5a7f4c8b496c293e87c8668 |   spark.worker.resource.{resourceName}.amount | 3.0.0 | SPARK-27371 | cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 |   spark.worker.resource.{resourceName}.discoveryScript | 3.0.0 | SPARK-27371 | cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 |   spark.worker.resourcesFile | 3.0.0 | SPARK-27369 | 7cbe01e#diff-b2fc8d6ab7ac5735085e2d6cfacb95da |   spark.shuffle.service.db.enabled | 3.0.0 | SPARK-26288 | 8b0aa59#diff-6bdad48cfc34314e89599655442ff210 |   spark.storage.cleanupFilesAfterExecutorExit | 2.4.0 | SPARK-24340 | 8ef167a#diff-916ca56b663f178f302c265b7ef38499 |   spark.deploy.recoveryMode | 0.8.1 | None | d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668 |   spark.deploy.recoveryDirectory | 0.8.1 | None | d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668 |   **docs/sql-data-sources-avro.md** Item name | Since version | JIRA ID | Commit ID | Note -- | -- | -- | -- | -- spark.sql.legacy.replaceDatabricksSparkAvro.enabled | 2.4.0 | SPARK-25129 | ac0174e#diff-9a6b543db706f1a90f790783d6930a13 |   spark.sql.avro.compression.codec | 2.4.0 | SPARK-24881 | 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13 |   spark.sql.avro.deflate.level | 2.4.0 | SPARK-24881 | 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13 |   **docs/sql-data-sources-orc.md** Item name | Since version | JIRA ID | Commit ID | Note -- | -- | -- | -- | -- spark.sql.orc.impl | 2.3.0 | SPARK-20728 | 326f1d6#diff-9a6b543db706f1a90f790783d6930a13 |   spark.sql.orc.enableVectorizedReader | 2.3.0 | SPARK-16060 | 60f6b99#diff-9a6b543db706f1a90f790783d6930a13 |   **docs/sql-data-sources-parquet.md** Item name | Since version | JIRA ID | Commit ID | Note -- | -- | -- | -- | -- spark.sql.parquet.binaryAsString | 1.1.1 | SPARK-2927 | de501e1#diff-41ef65b9ef5b518f77e2a03559893f4d |   spark.sql.parquet.int96AsTimestamp | 1.3.0 | SPARK-4987 | 67d5220#diff-41ef65b9ef5b518f77e2a03559893f4d |   spark.sql.parquet.compression.codec | 1.1.1 | SPARK-3131 | 3a9d874#diff-41ef65b9ef5b518f77e2a03559893f4d |   spark.sql.parquet.filterPushdown | 1.2.0 | SPARK-4391 | 576688a#diff-41ef65b9ef5b518f77e2a03559893f4d |   spark.sql.hive.convertMetastoreParquet | 1.1.1 | SPARK-2406 | cc4015d#diff-ff50aea397a607b79df9bec6f2a841db |   spark.sql.parquet.mergeSchema | 1.5.0 | SPARK-8690 | 246265f#diff-41ef65b9ef5b518f77e2a03559893f4d |   spark.sql.parquet.writeLegacyFormat | 1.6.0 | SPARK-10400 | 01cd688#diff-41ef65b9ef5b518f77e2a03559893f4d |   ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes apache#28064 from beliefer/supplement-doc-for-data-sources. Authored-by: beliefer <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
1 parent fc5d67f commit 4fc8ee7

File tree

4 files changed

+50
-9
lines changed

4 files changed

+50
-9
lines changed

docs/spark-standalone.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,13 +192,15 @@ SPARK_MASTER_OPTS supports the following system properties:
192192
<td>
193193
The maximum number of completed applications to display. Older applications will be dropped from the UI to maintain this limit.<br/>
194194
</td>
195+
<td>0.8.0</td>
195196
</tr>
196197
<tr>
197198
<td><code>spark.deploy.retainedDrivers</code></td>
198199
<td>200</td>
199200
<td>
200201
The maximum number of completed drivers to display. Older drivers will be dropped from the UI to maintain this limit.<br/>
201202
</td>
203+
<td>1.1.0</td>
202204
</tr>
203205
<tr>
204206
<td><code>spark.deploy.spreadOut</code></td>
@@ -208,6 +210,7 @@ SPARK_MASTER_OPTS supports the following system properties:
208210
to consolidate them onto as few nodes as possible. Spreading out is usually better for
209211
data locality in HDFS, but consolidating is more efficient for compute-intensive workloads. <br/>
210212
</td>
213+
<td>0.6.1</td>
211214
</tr>
212215
<tr>
213216
<td><code>spark.deploy.defaultCores</code></td>
@@ -219,6 +222,7 @@ SPARK_MASTER_OPTS supports the following system properties:
219222
Set this lower on a shared cluster to prevent users from grabbing
220223
the whole cluster by default. <br/>
221224
</td>
225+
<td>0.9.0</td>
222226
</tr>
223227
<tr>
224228
<td><code>spark.deploy.maxExecutorRetries</code></td>
@@ -234,6 +238,7 @@ SPARK_MASTER_OPTS supports the following system properties:
234238
<code>-1</code>.
235239
<br/>
236240
</td>
241+
<td>1.6.3</td>
237242
</tr>
238243
<tr>
239244
<td><code>spark.worker.timeout</code></td>
@@ -250,6 +255,7 @@ SPARK_MASTER_OPTS supports the following system properties:
250255
<td>
251256
Amount of a particular resource to use on the worker.
252257
</td>
258+
<td>3.0.0</td>
253259
</tr>
254260
<tr>
255261
<td><code>spark.worker.resource.{resourceName}.discoveryScript</code></td>
@@ -258,6 +264,7 @@ SPARK_MASTER_OPTS supports the following system properties:
258264
Path to resource discovery script, which is used to find a particular resource while worker starting up.
259265
And the output of the script should be formatted like the <code>ResourceInformation</code> class.
260266
</td>
267+
<td>3.0.0</td>
261268
</tr>
262269
<tr>
263270
<td><code>spark.worker.resourcesFile</code></td>
@@ -317,6 +324,7 @@ SPARK_WORKER_OPTS supports the following system properties:
317324
enabled). You should also enable <code>spark.worker.cleanup.enabled</code>, to ensure that the state
318325
eventually gets cleaned up. This config may be removed in the future.
319326
</td>
327+
<td>3.0.0</td>
320328
</tr>
321329
<tr>
322330
<td><code>spark.storage.cleanupFilesAfterExecutorExit</code></td>
@@ -329,6 +337,7 @@ SPARK_WORKER_OPTS supports the following system properties:
329337
all files/subdirectories of a stopped and timeout application.
330338
This only affects Standalone mode, support of other cluster manangers can be added in the future.
331339
</td>
340+
<td>2.4.0</td>
332341
</tr>
333342
<tr>
334343
<td><code>spark.worker.ui.compressedLogFileLengthCacheSize</code></td>
@@ -490,14 +499,16 @@ ZooKeeper is the best way to go for production-level high availability, but if y
490499
In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration:
491500

492501
<table class="table">
493-
<tr><th style="width:21%">System property</th><th>Meaning</th></tr>
502+
<tr><th style="width:21%">System property</th><th>Meaning</th><th>Since Version</th></tr>
494503
<tr>
495504
<td><code>spark.deploy.recoveryMode</code></td>
496505
<td>Set to FILESYSTEM to enable single-node recovery mode (default: NONE).</td>
506+
<td>0.8.1</td>
497507
</tr>
498508
<tr>
499509
<td><code>spark.deploy.recoveryDirectory</code></td>
500510
<td>The directory in which Spark will store recovery state, accessible from the Master's perspective.</td>
511+
<td>0.8.1</td>
501512
</tr>
502513
</table>
503514

docs/sql-data-sources-avro.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -258,21 +258,34 @@ Data source options of Avro can be set via:
258258
## Configuration
259259
Configuration of Avro can be done using the `setConf` method on SparkSession or by running `SET key=value` commands using SQL.
260260
<table class="table">
261-
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
261+
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr>
262262
<tr>
263263
<td>spark.sql.legacy.replaceDatabricksSparkAvro.enabled</td>
264264
<td>true</td>
265-
<td>If it is set to true, the data source provider <code>com.databricks.spark.avro</code> is mapped to the built-in but external Avro data source module for backward compatibility.</td>
265+
<td>
266+
If it is set to true, the data source provider <code>com.databricks.spark.avro</code> is mapped
267+
to the built-in but external Avro data source module for backward compatibility.
268+
</td>
269+
<td>2.4.0</td>
266270
</tr>
267271
<tr>
268272
<td>spark.sql.avro.compression.codec</td>
269273
<td>snappy</td>
270-
<td>Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate, snappy, bzip2 and xz. Default codec is snappy.</td>
274+
<td>
275+
Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate,
276+
snappy, bzip2 and xz. Default codec is snappy.
277+
</td>
278+
<td>2.4.0</td>
271279
</tr>
272280
<tr>
273281
<td>spark.sql.avro.deflate.level</td>
274282
<td>-1</td>
275-
<td>Compression level for the deflate codec used in writing of AVRO files. Valid value must be in the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level in the current implementation.</td>
283+
<td>
284+
Compression level for the deflate codec used in writing of AVRO files. Valid value must be in
285+
the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level
286+
in the current implementation.
287+
</td>
288+
<td>2.4.0</td>
276289
</tr>
277290
</table>
278291

docs/sql-data-sources-orc.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,15 +27,25 @@ serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileF
2727
the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`.
2828

2929
<table class="table">
30-
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
30+
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr>
3131
<tr>
3232
<td><code>spark.sql.orc.impl</code></td>
3333
<td><code>native</code></td>
34-
<td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support. <code>hive</code> means the ORC library in Hive.</td>
34+
<td>
35+
The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>.
36+
<code>native</code> means the native ORC support. <code>hive</code> means the ORC library
37+
in Hive.
38+
</td>
39+
<td>2.3.0</td>
3540
</tr>
3641
<tr>
3742
<td><code>spark.sql.orc.enableVectorizedReader</code></td>
3843
<td><code>true</code></td>
39-
<td>Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, a new non-vectorized ORC reader is used in <code>native</code> implementation. For <code>hive</code> implementation, this is ignored.</td>
44+
<td>
45+
Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>,
46+
a new non-vectorized ORC reader is used in <code>native</code> implementation.
47+
For <code>hive</code> implementation, this is ignored.
48+
</td>
49+
<td>2.3.0</td>
4050
</tr>
4151
</table>

docs/sql-data-sources-parquet.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
258258
`SET key=value` commands using SQL.
259259

260260
<table class="table">
261-
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
261+
<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
262262
<tr>
263263
<td><code>spark.sql.parquet.binaryAsString</code></td>
264264
<td>false</td>
@@ -267,6 +267,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
267267
not differentiate between binary data and strings when writing out the Parquet schema. This
268268
flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.
269269
</td>
270+
<td>1.1.1</td>
270271
</tr>
271272
<tr>
272273
<td><code>spark.sql.parquet.int96AsTimestamp</code></td>
@@ -275,6 +276,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
275276
Some Parquet-producing systems, in particular Impala and Hive, store Timestamp into INT96. This
276277
flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems.
277278
</td>
279+
<td>1.3.0</td>
278280
</tr>
279281
<tr>
280282
<td><code>spark.sql.parquet.compression.codec</code></td>
@@ -287,11 +289,13 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
287289
Note that <code>zstd</code> requires <code>ZStandardCodec</code> to be installed before Hadoop 2.9.0, <code>brotli</code> requires
288290
<code>BrotliCodec</code> to be installed.
289291
</td>
292+
<td>1.1.1</td>
290293
</tr>
291294
<tr>
292295
<td><code>spark.sql.parquet.filterPushdown</code></td>
293296
<td>true</td>
294297
<td>Enables Parquet filter push-down optimization when set to true.</td>
298+
<td>1.2.0</td>
295299
</tr>
296300
<tr>
297301
<td><code>spark.sql.hive.convertMetastoreParquet</code></td>
@@ -300,6 +304,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
300304
When set to false, Spark SQL will use the Hive SerDe for parquet tables instead of the built in
301305
support.
302306
</td>
307+
<td>1.1.1</td>
303308
</tr>
304309
<tr>
305310
<td><code>spark.sql.parquet.mergeSchema</code></td>
@@ -310,6 +315,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
310315
schema is picked from the summary file or a random data file if no summary file is available.
311316
</p>
312317
</td>
318+
<td>1.5.0</td>
313319
</tr>
314320
<tr>
315321
<td><code>spark.sql.parquet.writeLegacyFormat</code></td>
@@ -321,5 +327,6 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
321327
example, decimals will be written in int-based format. If Parquet output is intended for use
322328
with systems that do not support this newer format, set to true.
323329
</td>
330+
<td>1.6.0</td>
324331
</tr>
325332
</table>

0 commit comments

Comments
 (0)