Skip to content

Commit af6e023

Browse files
committed
Merge remote-tracking branch 'upstream/master'
2 parents 070d23f + 54930ec commit af6e023

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+2048
-316
lines changed

.github/ISSUE_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Thanks for participating in the XGBoost community! We use https://discuss.xgboost.ai for any general usage questions and discussions. The issue tracker is used for actionable items such as feature proposals discussion, roadmaps, and bug tracking. You are always welcomed to post on the forum first :)
1+
Thanks for participating in the XGBoost community! The issue tracker is used for actionable items such as feature proposals discussion, roadmaps, and bug tracking.
22

33
Issues that are inactive for a period of time may get closed. We adopt this policy so that we won't lose track of actionable issues that may fall at the bottom of the pile. Feel free to reopen a new one if you feel there is an additional problem that needs attention when an old one gets closed.
44

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ Debug
5252
*.bak
5353
#.Rbuildignore
5454
R-package.Rproj
55+
R-package/build/*
5556
*.cache*
5657
.mypy_cache/
5758
doxygen
@@ -144,11 +145,13 @@ credentials.csv
144145
.bloop
145146

146147
# python tests
148+
*.bin
147149
demo/**/*.txt
148150
*.dmatrix
149151
.hypothesis
150152
__MACOSX/
151153
model*.json
154+
/tests/python/models/models/
152155

153156
# R tests
154157
*.htm

R-package/NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ S3method(getinfo,xgb.Booster)
1010
S3method(getinfo,xgb.DMatrix)
1111
S3method(length,xgb.Booster)
1212
S3method(predict,xgb.Booster)
13+
S3method(predict,xgboost)
1314
S3method(print,xgb.Booster)
1415
S3method(print,xgb.DMatrix)
1516
S3method(print,xgb.cv.synchronous)

R-package/R/utils.R

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -423,7 +423,7 @@ NULL
423423
#'
424424
#' @description
425425
#' When it comes to serializing XGBoost models, it's possible to use R serializers such as
426-
#' [save()] or [saveRDS()] to serialize an XGBoost R model, but XGBoost also provides
426+
#' [save()] or [saveRDS()] to serialize an XGBoost model object, but XGBoost also provides
427427
#' its own serializers with better compatibility guarantees, which allow loading
428428
#' said models in other language bindings of XGBoost.
429429
#'
@@ -451,23 +451,24 @@ NULL
451451
#' not used for prediction / importance / plotting / etc.
452452
#' These R attributes are only preserved when using R's serializers.
453453
#'
454-
#' In addition to the regular `xgb.Booster` objects producted by [xgb.train()], the
455-
#' function [xgboost()] produces a different subclass `xgboost`, which keeps other
456-
#' additional metadata as R attributes such as class names in classification problems,
457-
#' and which has a dedicated `predict` method that uses different defaults. XGBoost's
454+
#' In addition to the regular `xgb.Booster` objects produced by [xgb.train()], the
455+
#' function [xgboost()] produces objects with a different subclass `xgboost` (which
456+
#' inherits from `xgb.Booster`), which keeps other additional metadata as R attributes
457+
#' such as class names in classification problems, and which has a dedicated `predict`
458+
#' method that uses different defaults and takes different argument names. XGBoost's
458459
#' own serializers can work with this `xgboost` class, but as they do not keep R
459460
#' attributes, the resulting object, when deserialized, is downcasted to the regular
460461
#' `xgb.Booster` class (i.e. it loses the metadata, and the resulting object will use
461-
#' `predict.xgb.Booster` instead of `predict.xgboost`) - for these `xgboost` objects,
462+
#' [predict.xgb.Booster()] instead of [predict.xgboost()]) - for these `xgboost` objects,
462463
#' `saveRDS` might thus be a better option if the extra functionalities are needed.
463464
#'
464465
#' Note that XGBoost models in R starting from version `2.1.0` and onwards, and
465466
#' XGBoost models before version `2.1.0`; have a very different R object structure and
466467
#' are incompatible with each other. Hence, models that were saved with R serializers
467468
#' like [saveRDS()] or [save()] before version `2.1.0` will not work with latter
468469
#' `xgboost` versions and vice versa. Be aware that the structure of R model objects
469-
#' could in theory change again in the future, so XGBoost's serializers
470-
#' should be preferred for long-term storage.
470+
#' could in theory change again in the future, so XGBoost's serializers should be
471+
#' preferred for long-term storage.
471472
#'
472473
#' Furthermore, note that using the package `qs` for serialization will require
473474
#' version 0.26 or higher of said package, and will have the same compatibility

R-package/R/xgb.Booster.R

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,8 @@ xgb.get.handle <- function(object) {
126126
#' of the iterations (rounds) otherwise.
127127
#'
128128
#' If passing "all", will use all of the rounds regardless of whether the model had early stopping or not.
129+
#'
130+
#' Not applicable to `gblinear` booster.
129131
#' @param strict_shape Whether to always return an array with the same dimensions for the given prediction mode
130132
#' regardless of the model type - meaning that, for example, both a multi-class and a binary classification
131133
#' model would generate output arrays with the same number of dimensions, with the 'class' dimension having
@@ -144,7 +146,13 @@ xgb.get.handle <- function(object) {
144146
#'
145147
#' If passing `TRUE`, then the result will have dimensions in reverse order - for example, rows
146148
#' will be the last dimensions instead of the first dimension.
147-
#' @param base_margin Base margin used for boosting from existing model.
149+
#' @param base_margin Base margin used for boosting from existing model (raw score that gets added to
150+
#' all observations independently of the trees in the model).
151+
#'
152+
#' If supplied, should be either a vector with length equal to the number of rows in `newdata`
153+
#' (for objectives which produces a single score per observation), or a matrix with number of
154+
#' rows matching to the number rows in `newdata` and number of columns matching to the number
155+
#' of scores estimated by the model (e.g. number of classes for multi-class classification).
148156
#'
149157
#' Note that, if `newdata` is an `xgb.DMatrix` object, this argument will
150158
#' be ignored as it needs to be added to the DMatrix instead (e.g. by passing it as
@@ -206,6 +214,9 @@ xgb.get.handle <- function(object) {
206214
#' For multi-class / multi-target, they will be arranged so that columns in the output will have
207215
#' the leafs from one group followed by leafs of the other group (e.g. order will be `group1:feat1`,
208216
#' `group1:feat2`, ..., `group2:feat1`, `group2:feat2`, ...).
217+
#'
218+
#' If there is more than one parallel tree (e.g. random forests), the parallel trees will be the
219+
#' last grouping in the resulting order, which will still be 2D.
209220
#' \item For `predcontrib`: when not multi-class / multi-target, a matrix with dimensions
210221
#' `[nrows, nfeats+1]`. The last "+ 1" column corresponds to the baseline value.
211222
#'
@@ -222,7 +233,7 @@ xgb.get.handle <- function(object) {
222233
#' For multi-class and multi-target, will be a 4D array with dimensions `[nrows, ngroups, nfeats+1, nfeats+1]`
223234
#' }
224235
#'
225-
#' If passing `strict_shape=FALSE`, the result is always an array:
236+
#' If passing `strict_shape=TRUE`, the result is always a matrix (if 2D) or array (if 3D or higher):
226237
#' - For normal predictions, the dimension is `[nrows, ngroups]`.
227238
#' - For `predcontrib=TRUE`, the dimension is `[nrows, ngroups, nfeats+1]`.
228239
#' - For `predinteraction=TRUE`, the dimension is `[nrows, ngroups, nfeats+1, nfeats+1]`.

R-package/R/xgb.DMatrix.R

Lines changed: 16 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,13 @@
99
#' method (`tree_method = "hist"`, which is the default algorithm), but is not usable for the
1010
#' sorted-indices method (`tree_method = "exact"`), nor for the approximate method
1111
#' (`tree_method = "approx"`).
12+
#'
1213
#' @param data Data from which to create a DMatrix, which can then be used for fitting models or
1314
#' for getting predictions out of a fitted model.
1415
#'
15-
#' Supported input types are as follows:\itemize{
16-
#' \item `matrix` objects, with types `numeric`, `integer`, or `logical`.
17-
#' \item `data.frame` objects, with columns of types `numeric`, `integer`, `logical`, or `factor`.
16+
#' Supported input types are as follows:
17+
#' - `matrix` objects, with types `numeric`, `integer`, or `logical`.
18+
#' - `data.frame` objects, with columns of types `numeric`, `integer`, `logical`, or `factor`
1819
#'
1920
#' Note that xgboost uses base-0 encoding for categorical types, hence `factor` types (which use base-1
2021
#' encoding') will be converted inside the function call. Be aware that the encoding used for `factor`
@@ -23,33 +24,14 @@
2324
#' was constructed.
2425
#'
2526
#' Other column types are not supported.
26-
#' \item CSR matrices, as class `dgRMatrix` from package `Matrix`.
27-
#' \item CSC matrices, as class `dgCMatrix` from package `Matrix`. These are **not** supported for
28-
#' 'xgb.QuantileDMatrix'.
29-
#' \item Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted
30-
#' as a single row (only when making predictions from a fitted model).
31-
#' \item Text files in a supported format, passed as a `character` variable containing the URI path to
32-
#' the file, with an optional format specifier.
33-
#'
34-
#' These are **not** supported for `xgb.QuantileDMatrix`. Supported formats are:\itemize{
35-
#' \item XGBoost's own binary format for DMatrices, as produced by [xgb.DMatrix.save()].
36-
#' \item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix
37-
#' `?format=libsvm` at the end of the file path. It will be the default format if not
38-
#' otherwise specified.
39-
#' \item CSV files (comma-separated values). This format can be specified by adding suffix
40-
#' `?format=csv` at the end ofthe file path. It will **not** be auto-deduced from file extensions.
41-
#' }
27+
#' - CSR matrices, as class `dgRMatrix` from package `Matrix`.
28+
#' - CSC matrices, as class `dgCMatrix` from package `Matrix`.
4229
#'
43-
#' Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv',
44-
#' it will not look at the extension or file contents to determine that it is a comma-separated value.
45-
#' Instead, the format must be specified following the URI format, so the input to `data` should be passed
46-
#' like this: `"file.csv?format=csv"` (or `"file.csv?format=csv&label_column=0"` if the first column
47-
#' corresponds to the labels).
30+
#' These are **not** supported by `xgb.QuantileDMatrix`.
31+
#' - XGBoost's own binary format for DMatrices, as produced by [xgb.DMatrix.save()].
32+
#' - Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted
33+
#' as a single row (only when making predictions from a fitted model).
4834
#'
49-
#' For more information about passing text files as input, see the articles
50-
#' \href{https://xgboost.readthedocs.io/en/stable/tutorials/input_format.html}{Text Input Format of DMatrix} and
51-
#' \href{https://xgboost.readthedocs.io/en/stable/python/python_intro.html#python-data-interface}{Data Interface}.
52-
#' }
5335
#' @param label Label of the training data. For classification problems, should be passed encoded as
5436
#' integers with numeration starting at zero.
5537
#' @param weight Weight for each instance.
@@ -95,15 +77,9 @@
9577
#' @param label_lower_bound Lower bound for survival training.
9678
#' @param label_upper_bound Upper bound for survival training.
9779
#' @param feature_weights Set feature weights for column sampling.
98-
#' @param data_split_mode When passing a URI (as R `character`) as input, this signals
99-
#' whether to split by row or column. Allowed values are `"row"` and `"col"`.
100-
#'
101-
#' In distributed mode, the file is split accordingly; otherwise this is only an indicator on
102-
#' how the file was split beforehand. Default to row.
103-
#'
104-
#' This is not used when `data` is not a URI.
105-
#' @return An 'xgb.DMatrix' object. If calling 'xgb.QuantileDMatrix', it will have additional
106-
#' subclass 'xgb.QuantileDMatrix'.
80+
#' @param data_split_mode Not used yet. This parameter is for distributed training, which is not yet available for the R package.
81+
#' @return An 'xgb.DMatrix' object. If calling `xgb.QuantileDMatrix`, it will have additional
82+
#' subclass `xgb.QuantileDMatrix`.
10783
#'
10884
#' @details
10985
#' Note that DMatrix objects are not serializable through R functions such as [saveRDS()] or [save()].
@@ -145,6 +121,9 @@ xgb.DMatrix <- function(
145121
if (!is.null(group) && !is.null(qid)) {
146122
stop("Either one of 'group' or 'qid' should be NULL")
147123
}
124+
if (data_split_mode != "row") {
125+
stop("'data_split_mode' is not supported yet.")
126+
}
148127
nthread <- as.integer(NVL(nthread, -1L))
149128
if (typeof(data) == "character") {
150129
if (length(data) > 1) {

R-package/R/xgb.create.features.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@
8686
#' @export
8787
xgb.create.features <- function(model, data, ...) {
8888
check.deprecation(...)
89-
pred_with_leaf <- predict(model, data, predleaf = TRUE)
89+
pred_with_leaf <- predict.xgb.Booster(model, data, predleaf = TRUE)
9090
cols <- lapply(as.data.frame(pred_with_leaf), factor)
9191
cbind(data, sparse.model.matrix(~ . -1, cols)) # nolint
9292
}

R-package/R/xgb.plot.shap.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
#' @param target_class Only relevant for multiclass models. The default (`NULL`)
1717
#' averages the SHAP values over all classes. Pass a (0-based) class index
1818
#' to show only SHAP values of that class.
19-
#' @param approxcontrib Passed to `predict()` when `shap_contrib = NULL`.
19+
#' @param approxcontrib Passed to [predict.xgb.Booster()] when `shap_contrib = NULL`.
2020
#' @param subsample Fraction of data points randomly picked for plotting.
2121
#' The default (`NULL`) will use up to 100k data points.
2222
#' @param n_col Number of columns in a grid of plots.
@@ -353,7 +353,7 @@ xgb.shap.data <- function(data, shap_contrib = NULL, features = NULL, top_n = 1,
353353
}
354354

355355
if (is.null(shap_contrib)) {
356-
shap_contrib <- predict(
356+
shap_contrib <- predict.xgb.Booster(
357357
model,
358358
newdata = data,
359359
predcontrib = TRUE,

0 commit comments

Comments
 (0)