[SPARK-31306][DOCS] update rand() function documentation to indicate exclusive upper bound

Smeb · HyukjinKwon · commit fa378567105e · 2020-03-31T15:16:17.000+09:00
### What changes were proposed in this pull request? A small documentation change to clarify that the `rand()` function produces values in `[0.0, 1.0)`. ### Why are the changes needed? `rand()` uses `Rand()` - which generates values in [0, 1) ([documented here](https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71)). The existing documentation suggests that 1.0 is a possible value returned by rand (i.e for a distribution written as `X ~ U(a, b)`, x can be a or b, so `U[0.0, 1.0]` suggests the value returned could include 1.0). ### Does this PR introduce any user-facing change? Only documentation changes. ### How was this patch tested? Documentation changes only. Closes apache#28071 from Smeb/master. Authored-by: Ben Ryves <benjamin.ryves@getyourguide.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
@@ -2975,7 +2975,7 @@ setMethod("lpad", signature(x = "Column", len = "numeric", pad = "character"),
 
 #' @details
 #' \code{rand}: Generates a random column with independent and identically distributed (i.i.d.)
-#' samples from U[0.0, 1.0].
+#' samples uniformly distributed in [0.0, 1.0).
 #' Note: the function is non-deterministic in general case.
 #'
 #' @rdname column_nonaggregate_functions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
@@ -652,7 +652,7 @@ def percentile_approx(col, percentage, accuracy=10000):
 @since(1.4)
 def rand(seed=None):
     """Generates a random column with independent and identically distributed (i.i.d.) samples
-    from U[0.0, 1.0].
+    uniformly distributed in [0.0, 1.0).
 
     .. note:: The function is non-deterministic in general case.
 
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -1227,7 +1227,7 @@ object functions {
 
   /**
    * Generate a random column with independent and identically distributed (i.i.d.) samples
-   * from U[0.0, 1.0].
+   * uniformly distributed in [0.0, 1.0).
    *
    * @note The function is non-deterministic in general case.
    *
@@ -1238,7 +1238,7 @@ object functions {
 
   /**
    * Generate a random column with independent and identically distributed (i.i.d.) samples
-   * from U[0.0, 1.0].
+   * uniformly distributed in [0.0, 1.0).
    *
    * @note The function is non-deterministic in general case.
    *

Original file line number	Diff line number	Diff line change
`@@ -1227,7 +1227,7 @@ object functions {`
`1227`	`1227`
`1228`	`1228`	`/**`
`1229`	`1229`	`* Generate a random column with independent and identically distributed (i.i.d.) samples`
`1230`		`- * from U[0.0, 1.0].`
	`1230`	`+ * uniformly distributed in [0.0, 1.0).`
`1231`	`1231`	`*`
`1232`	`1232`	`* @note The function is non-deterministic in general case.`
`1233`	`1233`	`*`
`@@ -1238,7 +1238,7 @@ object functions {`
`1238`	`1238`
`1239`	`1239`	`/**`
`1240`	`1240`	`* Generate a random column with independent and identically distributed (i.i.d.) samples`
`1241`		`- * from U[0.0, 1.0].`
	`1241`	`+ * uniformly distributed in [0.0, 1.0).`
`1242`	`1242`	`*`
`1243`	`1243`	`* @note The function is non-deterministic in general case.`
`1244`	`1244`	`*`