Skip to content

Commit 201569e

Browse files
authored
Small cleanup for the objective interface and intercept document. (#11649)
- Clarify the code comments. - Add test for the effect of intercept and base margin.
1 parent a8924d4 commit 201569e

File tree

12 files changed

+179
-91
lines changed

12 files changed

+179
-91
lines changed

doc/parameter.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -419,7 +419,7 @@ Specify the learning task and the corresponding learning objective. The objectiv
419419

420420
* ``base_score``
421421

422-
- The initial prediction score of all instances, global bias
422+
- The initial prediction score of all instances, global bias.
423423
- The parameter is automatically estimated for selected objectives before training. To
424424
disable the estimation, specify a real number argument.
425425
- If ``base_margin`` is supplied, ``base_score`` will not be added.

doc/tutorials/external_memory.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -274,9 +274,7 @@ floating point samples, `512` features (total 1TB) on a GH200 (a H200 GPU connec
274274
Grace CPU by a chip-to-chip link) system. One can start with:
275275
- Evenly divide the data into 128 batches with 8GB per batch.
276276
- Define a custom iterator as previously described.
277-
- Set the `max_quantile_batches` parameter of the
278-
:py:class:`~xgboost.ExtMemQuantileDMatrix` to 32 (256GB per sub-stream for
279-
quantization). Load the data.
277+
- Set the `max_quantile_batches` parameter of the :py:class:`~xgboost.ExtMemQuantileDMatrix` to 32 (256GB per sub-stream for quantization). Load the data.
280278
- Start training with ``device=cuda``.
281279

282280
To run experiments on these platforms, the open source `NVIDIA Linux driver

doc/tutorials/intercept.rst

Lines changed: 55 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,4 +136,58 @@ We have:
136136
E[c_i] &= \exp{(F(x_i) + \ln{\gamma_i})} \\
137137
E[c_i] &= g^{-1}(F(x_i) + g(\gamma_i))
138138
139-
As you can see, we can use the ``base_margin`` for modeling with offset similar to GLMs
139+
As you can see, we can use the ``base_margin`` for modeling with offset similar to GLMs
140+
141+
*******
142+
Example
143+
*******
144+
145+
The following example shows the relationship between ``base_score`` and ``base_margin``
146+
using binary logistic with a `logit` link function:
147+
148+
.. code-block:: python
149+
150+
import numpy as np
151+
from scipy.special import logit
152+
from sklearn.datasets import make_classification
153+
from xgboost import train, DMatrix
154+
155+
X, y = make_classification(random_state=2025)
156+
157+
The intercept is a valid probability (0.5). It's used as the initial estimation of the
158+
probability of obtaining a positive sample.
159+
160+
.. code-block:: python
161+
162+
intercept = 0.5
163+
164+
First we use the intercept to train a model:
165+
166+
.. code-block:: python
167+
168+
booster = train(
169+
{"base_score": intercept, "objective": "binary:logistic"},
170+
dtrain=DMatrix(X, y),
171+
num_boost_round=1,
172+
)
173+
predt_0 = booster.predict(DMatrix(X, y))
174+
175+
Apply :py:func:`~scipy.special.logit` to obtain the "margin":
176+
177+
.. code-block:: python
178+
179+
margin = np.full(y.shape, fill_value=logit(intercept), dtype=np.float32)
180+
Xy = DMatrix(X, y, base_margin=margin)
181+
# 0.2 is a dummy value to show that `base_margin` overrides `base_score`.
182+
booster = train(
183+
{"base_score": 0.2, "objective": "binary:logistic"},
184+
dtrain=Xy,
185+
num_boost_round=1,
186+
)
187+
predt_1 = booster.predict(Xy)
188+
189+
Compare the results:
190+
191+
.. code-block:: python
192+
193+
np.testing.assert_allclose(predt_0, predt_1)

include/xgboost/learner.h

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
#include <dmlc/io.h> // for Serializable
1212
#include <xgboost/base.h> // for bst_feature_t, bst_target_t, bst_float, Args, GradientPair, ..
1313
#include <xgboost/context.h> // for Context
14-
#include <xgboost/linalg.h> // for Tensor, TensorView
14+
#include <xgboost/linalg.h> // for Vector, VectorView
1515
#include <xgboost/metric.h> // for Metric
1616
#include <xgboost/model.h> // for Configurable, Model
1717
#include <xgboost/span.h> // for Span
@@ -284,58 +284,61 @@ class Learner : public Model, public Configurable, public dmlc::Serializable {
284284
struct LearnerModelParamLegacy;
285285

286286
/**
287-
* \brief Strategy for building multi-target models.
287+
* @brief Strategy for building multi-target models.
288288
*/
289289
enum class MultiStrategy : std::int32_t {
290290
kOneOutputPerTree = 0,
291291
kMultiOutputTree = 1,
292292
};
293293

294294
/**
295-
* \brief Basic model parameters, used to describe the booster.
295+
* @brief Basic model parameters, used to describe the booster.
296296
*/
297297
struct LearnerModelParam {
298298
private:
299299
/**
300-
* \brief Global bias, this is just a scalar value but can be extended to vector when we
300+
* @brief Global bias, this is just a scalar value but can be extended to vector when we
301301
* support multi-class and multi-target.
302+
*
303+
* The value stored here is the value before applying the inverse link function, used
304+
* for initializing the prediction matrix/vector.
302305
*/
303-
linalg::Tensor<float, 1> base_score_;
306+
linalg::Vector<float> base_score_;
304307

305308
public:
306309
/**
307-
* \brief The number of features.
310+
* @brief The number of features.
308311
*/
309312
bst_feature_t num_feature{0};
310313
/**
311-
* \brief The number of classes or targets.
314+
* @brief The number of classes or targets.
312315
*/
313316
std::uint32_t num_output_group{0};
314317
/**
315-
* \brief Current task, determined by objective.
318+
* @brief Current task, determined by objective.
316319
*/
317320
ObjInfo task{ObjInfo::kRegression};
318321
/**
319-
* \brief Strategy for building multi-target models.
322+
* @brief Strategy for building multi-target models.
320323
*/
321324
MultiStrategy multi_strategy{MultiStrategy::kOneOutputPerTree};
322325

323326
LearnerModelParam() = default;
324327
// As the old `LearnerModelParamLegacy` is still used by binary IO, we keep
325328
// this one as an immutable copy.
326329
LearnerModelParam(Context const* ctx, LearnerModelParamLegacy const& user_param,
327-
linalg::Tensor<float, 1> base_margin, ObjInfo t, MultiStrategy multi_strategy);
330+
linalg::Vector<float> base_score, ObjInfo t, MultiStrategy multi_strategy);
328331
LearnerModelParam(LearnerModelParamLegacy const& user_param, ObjInfo t,
329332
MultiStrategy multi_strategy);
330-
LearnerModelParam(bst_feature_t n_features, linalg::Tensor<float, 1> base_score,
333+
LearnerModelParam(bst_feature_t n_features, linalg::Vector<float> base_score,
331334
std::uint32_t n_groups, bst_target_t n_targets, MultiStrategy multi_strategy)
332335
: base_score_{std::move(base_score)},
333336
num_feature{n_features},
334337
num_output_group{std::max(n_groups, n_targets)},
335338
multi_strategy{multi_strategy} {}
336339

337-
linalg::TensorView<float const, 1> BaseScore(Context const* ctx) const;
338-
[[nodiscard]] linalg::TensorView<float const, 1> BaseScore(DeviceOrd device) const;
340+
linalg::VectorView<float const> BaseScore(Context const* ctx) const;
341+
[[nodiscard]] linalg::VectorView<float const> BaseScore(DeviceOrd device) const;
339342

340343
void Copy(LearnerModelParam const& that);
341344
[[nodiscard]] bool IsVectorLeaf() const noexcept {

include/xgboost/objective.h

Lines changed: 43 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
/**
2-
* Copyright 2014-2024, XGBoost Contributors
3-
* \file objective.h
4-
* \brief interface of objective function used by xgboost.
5-
* \author Tianqi Chen, Kailong Chen
2+
* Copyright 2014-2025, XGBoost Contributors
3+
*
4+
* @brief interface of objective function used by xgboost.
5+
* @author Tianqi Chen, Kailong Chen
66
*/
77
#ifndef XGBOOST_OBJECTIVE_H_
88
#define XGBOOST_OBJECTIVE_H_
@@ -11,19 +11,20 @@
1111
#include <xgboost/base.h>
1212
#include <xgboost/data.h>
1313
#include <xgboost/host_device_vector.h>
14+
#include <xgboost/linalg.h> // for Vector
1415
#include <xgboost/model.h>
1516
#include <xgboost/task.h>
1617

17-
#include <cstdint> // std::int32_t
18+
#include <cstdint> // for int32_t
1819
#include <functional>
19-
#include <string>
20+
#include <string> // for string
2021

2122
namespace xgboost {
2223

2324
class RegTree;
2425
struct Context;
2526

26-
/*! \brief interface of objective function */
27+
/** @brief The interface of objective function */
2728
class ObjFunction : public Configurable {
2829
protected:
2930
Context const* ctx_;
@@ -32,32 +33,30 @@ class ObjFunction : public Configurable {
3233
static constexpr float DefaultBaseScore() { return 0.5f; }
3334

3435
public:
35-
/*! \brief virtual destructor */
3636
~ObjFunction() override = default;
37-
/*!
38-
* \brief Configure the objective with the specified parameters.
39-
* \param args arguments to the objective function.
37+
/**
38+
* @brief Configure the objective with the specified parameters.
39+
*
40+
* @param args arguments to the objective function.
4041
*/
4142
virtual void Configure(Args const& args) = 0;
4243
/**
4344
* @brief Get gradient over each of predictions, given existing information.
4445
*
45-
* @param preds prediction of current round
46-
* @param info information about labels, weights, groups in rank
46+
* @param preds Raw prediction (before applying the inverse link) of the current round.
47+
* @param info information about labels, weights, groups in rank.
4748
* @param iteration current iteration number.
4849
* @param out_gpair output of get gradient, saves gradient and second order gradient in
4950
*/
5051
virtual void GetGradient(HostDeviceVector<float> const& preds, MetaInfo const& info,
5152
std::int32_t iter, linalg::Matrix<GradientPair>* out_gpair) = 0;
5253

53-
/*! \return the default evaluation metric for the objective */
54-
virtual const char* DefaultEvalMetric() const = 0;
54+
/** @return the default evaluation metric for the objective */
55+
[[nodiscard]] virtual const char* DefaultEvalMetric() const = 0;
5556
/**
56-
* \brief Return the configuration for the default metric.
57+
* @brief Return the configuration for the default metric.
5758
*/
58-
virtual Json DefaultMetricConfig() const { return Json{Null{}}; }
59-
60-
// the following functions are optional, most of time default implementation is good enough
59+
[[nodiscard]] virtual Json DefaultMetricConfig() const { return Json{Null{}}; }
6160
/**
6261
* @brief Apply inverse link (activation) function to prediction values.
6362
*
@@ -75,25 +74,28 @@ class ObjFunction : public Configurable {
7574
*/
7675
virtual void EvalTransform(HostDeviceVector<float>* io_preds) { this->PredTransform(io_preds); }
7776
/**
78-
* @brief Apply link function to the intercept.
77+
* @brief Apply the link function to the intercept.
7978
*
80-
* This is used to transform user-set base_score back to margin used by gradient
81-
* boosting
79+
* This is an inverse of `PredTransform` for most of the objectives (if there's a
80+
* valid inverse). It's used to transform user-set base_score back to margin used by
81+
* gradient boosting. The method converts objective-based valid outputs like
82+
* probability back to raw model outputs.
8283
*
8384
* @return transformed value
8485
*/
8586
[[nodiscard]] virtual float ProbToMargin(float base_score) const { return base_score; }
8687
/**
87-
* @brief Obtain the initial estimation of prediction.
88+
* @brief Obtain the initial estimation of prediction (intercept).
8889
*
89-
* The output in `base_score` represents prediction after apply the inverse link function.
90+
* The output in `base_score` represents prediction after apply the inverse link function
91+
* (valid prediction instead of raw).
9092
*
9193
* @param info MetaInfo that contains label.
9294
* @param base_score Output estimation.
9395
*/
94-
virtual void InitEstimation(MetaInfo const& info, linalg::Tensor<float, 1>* base_score) const;
95-
/*!
96-
* \brief Return task of this objective.
96+
virtual void InitEstimation(MetaInfo const& info, linalg::Vector<float>* base_score) const;
97+
/**
98+
* @brief Return task of this objective.
9799
*/
98100
[[nodiscard]] virtual struct ObjInfo Task() const = 0;
99101
/**
@@ -106,31 +108,33 @@ class ObjFunction : public Configurable {
106108
}
107109
return 1;
108110
}
111+
/** @brief Getter of the context. */
112+
[[nodiscard]] Context const* Ctx() const { return this->ctx_; }
109113

110114
/**
111-
* \brief Update the leaf values after a tree is built. Needed for objectives with 0
115+
* @brief Update the leaf values after a tree is built. Needed for objectives with 0
112116
* hessian.
113117
*
114118
* Note that the leaf update is not well defined for distributed training as XGBoost
115119
* computes only an average of quantile between workers. This breaks when some leaf
116120
* have no sample assigned in a local worker.
117121
*
118-
* \param position The leaf index for each rows.
119-
* \param info MetaInfo providing labels and weights.
120-
* \param learning_rate The learning rate for current iteration.
121-
* \param prediction Model prediction after transformation.
122-
* \param group_idx The group index for this tree, 0 when it's not multi-target or multi-class.
123-
* \param p_tree Tree that needs to be updated.
122+
* @param position The leaf index for each rows.
123+
* @param info MetaInfo providing labels and weights.
124+
* @param learning_rate The learning rate for current iteration.
125+
* @param prediction Model prediction after transformation.
126+
* @param group_idx The group index for this tree, 0 when it's not multi-target or multi-class.
127+
* @param p_tree Tree that needs to be updated.
124128
*/
125129
virtual void UpdateTreeLeaf(HostDeviceVector<bst_node_t> const& /*position*/,
126130
MetaInfo const& /*info*/, float /*learning_rate*/,
127131
HostDeviceVector<float> const& /*prediction*/,
128132
std::int32_t /*group_idx*/, RegTree* /*p_tree*/) const {}
129-
130-
/*!
131-
* \brief Create an objective function according to name.
132-
* \param ctx Pointer to runtime parameters.
133-
* \param name Name of the objective.
133+
/**
134+
* @brief Create an objective function according to the name.
135+
*
136+
* @param name Name of the objective.
137+
* @param ctx Pointer to the context.
134138
*/
135139
static ObjFunction* Create(const std::string& name, Context const* ctx);
136140
};

python-package/xgboost/testing/predict.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,12 @@
33
from typing import Type
44

55
import numpy as np
6+
from scipy.special import logit # pylint: disable=no-name-in-module
67

78
from ..core import DMatrix
89
from ..training import train
910
from .shared import validate_leaf_output
11+
from .updater import get_basescore
1012
from .utils import Device
1113

1214

@@ -63,3 +65,32 @@ def run_predict_leaf(device: Device, DMatrixT: Type[DMatrix]) -> np.ndarray:
6365
assert booster.predict(m, pred_leaf=True).shape == (rows,)
6466

6567
return leaf
68+
69+
70+
def run_base_margin_vs_base_score(device: Device) -> None:
71+
"""Test for the relation between score and margin."""
72+
from sklearn.datasets import make_classification
73+
74+
intercept = 0.5
75+
76+
X, y = make_classification(random_state=2025)
77+
booster = train(
78+
{"base_score": intercept, "objective": "binary:logistic", "device": device},
79+
dtrain=DMatrix(X, y),
80+
num_boost_round=1,
81+
)
82+
np.testing.assert_allclose(get_basescore(booster), intercept)
83+
predt_0 = booster.predict(DMatrix(X, y))
84+
85+
margin = np.full(y.shape, fill_value=logit(intercept), dtype=np.float32)
86+
Xy = DMatrix(X, y, base_margin=margin)
87+
# 0.2 is a dummy value
88+
booster = train(
89+
{"base_score": 0.2, "objective": "binary:logistic", "device": device},
90+
dtrain=Xy,
91+
num_boost_round=1,
92+
)
93+
np.testing.assert_allclose(get_basescore(booster), 0.2)
94+
predt_1 = booster.predict(Xy)
95+
96+
np.testing.assert_allclose(predt_0, predt_1)

0 commit comments

Comments
 (0)