|
9 | 9 | #' method (`tree_method = "hist"`, which is the default algorithm), but is not usable for the |
10 | 10 | #' sorted-indices method (`tree_method = "exact"`), nor for the approximate method |
11 | 11 | #' (`tree_method = "approx"`). |
| 12 | +#' |
12 | 13 | #' @param data Data from which to create a DMatrix, which can then be used for fitting models or |
13 | 14 | #' for getting predictions out of a fitted model. |
14 | 15 | #' |
15 | | -#' Supported input types are as follows:\itemize{ |
16 | | -#' \item `matrix` objects, with types `numeric`, `integer`, or `logical`. |
17 | | -#' \item `data.frame` objects, with columns of types `numeric`, `integer`, `logical`, or `factor`. |
| 16 | +#' Supported input types are as follows: |
| 17 | +#' - `matrix` objects, with types `numeric`, `integer`, or `logical`. |
| 18 | +#' - `data.frame` objects, with columns of types `numeric`, `integer`, `logical`, or `factor` |
18 | 19 | #' |
19 | 20 | #' Note that xgboost uses base-0 encoding for categorical types, hence `factor` types (which use base-1 |
20 | 21 | #' encoding') will be converted inside the function call. Be aware that the encoding used for `factor` |
|
23 | 24 | #' was constructed. |
24 | 25 | #' |
25 | 26 | #' Other column types are not supported. |
26 | | -#' \item CSR matrices, as class `dgRMatrix` from package `Matrix`. |
27 | | -#' \item CSC matrices, as class `dgCMatrix` from package `Matrix`. These are **not** supported for |
28 | | -#' 'xgb.QuantileDMatrix'. |
29 | | -#' \item Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted |
30 | | -#' as a single row (only when making predictions from a fitted model). |
31 | | -#' \item Text files in a supported format, passed as a `character` variable containing the URI path to |
32 | | -#' the file, with an optional format specifier. |
33 | | -#' |
34 | | -#' These are **not** supported for `xgb.QuantileDMatrix`. Supported formats are:\itemize{ |
35 | | -#' \item XGBoost's own binary format for DMatrices, as produced by [xgb.DMatrix.save()]. |
36 | | -#' \item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix |
37 | | -#' `?format=libsvm` at the end of the file path. It will be the default format if not |
38 | | -#' otherwise specified. |
39 | | -#' \item CSV files (comma-separated values). This format can be specified by adding suffix |
40 | | -#' `?format=csv` at the end ofthe file path. It will **not** be auto-deduced from file extensions. |
41 | | -#' } |
| 27 | +#' - CSR matrices, as class `dgRMatrix` from package `Matrix`. |
| 28 | +#' - CSC matrices, as class `dgCMatrix` from package `Matrix`. |
42 | 29 | #' |
43 | | -#' Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv', |
44 | | -#' it will not look at the extension or file contents to determine that it is a comma-separated value. |
45 | | -#' Instead, the format must be specified following the URI format, so the input to `data` should be passed |
46 | | -#' like this: `"file.csv?format=csv"` (or `"file.csv?format=csv&label_column=0"` if the first column |
47 | | -#' corresponds to the labels). |
| 30 | +#' These are **not** supported by `xgb.QuantileDMatrix`. |
| 31 | +#' - XGBoost's own binary format for DMatrices, as produced by [xgb.DMatrix.save()]. |
| 32 | +#' - Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted |
| 33 | +#' as a single row (only when making predictions from a fitted model). |
48 | 34 | #' |
49 | | -#' For more information about passing text files as input, see the articles |
50 | | -#' \href{https://xgboost.readthedocs.io/en/stable/tutorials/input_format.html}{Text Input Format of DMatrix} and |
51 | | -#' \href{https://xgboost.readthedocs.io/en/stable/python/python_intro.html#python-data-interface}{Data Interface}. |
52 | | -#' } |
53 | 35 | #' @param label Label of the training data. For classification problems, should be passed encoded as |
54 | 36 | #' integers with numeration starting at zero. |
55 | 37 | #' @param weight Weight for each instance. |
|
95 | 77 | #' @param label_lower_bound Lower bound for survival training. |
96 | 78 | #' @param label_upper_bound Upper bound for survival training. |
97 | 79 | #' @param feature_weights Set feature weights for column sampling. |
98 | | -#' @param data_split_mode When passing a URI (as R `character`) as input, this signals |
99 | | -#' whether to split by row or column. Allowed values are `"row"` and `"col"`. |
100 | | -#' |
101 | | -#' In distributed mode, the file is split accordingly; otherwise this is only an indicator on |
102 | | -#' how the file was split beforehand. Default to row. |
103 | | -#' |
104 | | -#' This is not used when `data` is not a URI. |
105 | | -#' @return An 'xgb.DMatrix' object. If calling 'xgb.QuantileDMatrix', it will have additional |
106 | | -#' subclass 'xgb.QuantileDMatrix'. |
| 80 | +#' @param data_split_mode Not used yet. This parameter is for distributed training, which is not yet available for the R package. |
| 81 | +#' @return An 'xgb.DMatrix' object. If calling `xgb.QuantileDMatrix`, it will have additional |
| 82 | +#' subclass `xgb.QuantileDMatrix`. |
107 | 83 | #' |
108 | 84 | #' @details |
109 | 85 | #' Note that DMatrix objects are not serializable through R functions such as [saveRDS()] or [save()]. |
@@ -145,6 +121,9 @@ xgb.DMatrix <- function( |
145 | 121 | if (!is.null(group) && !is.null(qid)) { |
146 | 122 | stop("Either one of 'group' or 'qid' should be NULL") |
147 | 123 | } |
| 124 | + if (data_split_mode != "row") { |
| 125 | + stop("'data_split_mode' is not supported yet.") |
| 126 | + } |
148 | 127 | nthread <- as.integer(NVL(nthread, -1L)) |
149 | 128 | if (typeof(data) == "character") { |
150 | 129 | if (length(data) > 1) { |
|
0 commit comments