Rank Evaluation Metric NDCG inconsistent between LGBMRanker.fit vs. LGBMRanker.predict() #6814

cjsombric · 2025-02-06T16:50:44Z

Description

LGBMRanker's eval metric 'ndcg' does not match (1) the NDCG I generate by hand or (2) the NDCG generated by sklearn.metrics.ndcg_score.
I believe this is caused by different predictions being generated in the "fit" and "predict" LGBMRanker processes.

Reproducible example

`
from lightgbm import LGBMRanker
import pandas as pd

unable to include training data:

X_train = []
y_train = []
train_groups = []

Will include a small sample validation set:

X_ex_small = [] # too big to include
relevance_data = [0, 1, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
y_ex_small = pd.Series(relevance_data)
y_ex_small.index = ['123'] * len(y_ex_small)

val_ex_small = pd.Series([20], index=["123"])

model_gdbt = LGBMRanker(objective="lambdarank",
boosting_type = "gbdt",
random_state=42,
max_depth=10,
min_data_in_leaf=200,
n_estimators=100,
subsample=0.5,
colsample_bytree=0.6,
lambda_l1=0.9,
lambda_l2=0.9,
n_jobs = -1)

model_gdbt.fit(X_train,
y_train,
group=train_groups,
eval_set=[(X_train, y_train), (X_ex_small, y_ex_small)],
eval_group=[train_groups, val_ex_small],
eval_metric=['ndcg'],
eval_at=[1, 5])

lightgbm's NDCG for the validation set (X_ex_small, y_ex_small)

lightgbm_ndcg5 = model_gdbt.evals_result_['valid']['ndcg@5'][-1] # returns 0.9596358141506394

sklearn NDCG for the validation set (X_ex_small, y_ex_small)

from sklearn.metrics import ndcg_score
y_ex_small_pred = model_gdbt.predict(X_ex_small) # returns [3.35057029, 3.46270607, 3.49636596, 3.13087368, 3.55823747, 3.28307494, 3.37924097, 2.84076022, 3.39463737, 3.51163148, 3.497507 , 3.29053029, 3.25150361, 3.12610702, 3.06495292, 2.70117685, 3.17200958, 3.52930069, 2.90185473, 3.51950751]
sklearn_ndcg5 = ndcg_score([y_ex_small.values], [y_ex_small_pred], k=5) # returns 0.8637574337885663

Manual NDCG for the validation set (X_ex_small, y_ex_small)

#Visulize the relevance scores (y_ex_small) nest to the predicted relevance scores (y_ex_small_pred) and sort by the predicted relevance
df_temp = y_ex_small.to_frame()
df_temp["pred"] = y_ex_small_pred
df_temp.sort_values(by="pred", ascending=False)

returns:

#relevance | pred
#-- | --
#4 | 3.558237
#0 | 3.529301
#0 | 3.519508
#0 | 3.511631
#0 | 3.497507
#0 | 3.496366
#1 | 3.462706
#...

Manual computation of ndcg@5 which matches sklearn, but not lightgbm

from math import log2
dcg5 = ( 4 / log2(1 + 1))
idcg5 = ( 4 / log2(1 + 1)) + ( 1 / log2(1 + 2))
manual_ndcg5 = dcg5/idcg5 # returns 0.8637574337885663

Thus manual_ndcg5==sklearn_ndcg5

BUT manual_ndcg5!=lightgbm_ndcg5

In order to replicate the lightgbm_ndcg5 I need to assume the predicted ranking had the relevance 1 document in the 4th position...

from math import log2
dcg5_2replicate_lightgbm_ndcg5 = ( 4 / log2(1 + 1)) + ( 1 / log2(1 + 4))
idcg5_2replicate_lightgbm_ndcg5 = ( 4 / log2(1 + 1)) + ( 1 / log2(1 + 2))
manual_ndcg5_2replicate_lightgbm_ndcg5 = dcg5_2replicate_lightgbm_ndcg5/idcg5_2replicate_lightgbm_ndcg5 # returns 0.9596358141506394

`

Environment info

LightGBM version or commit hash:
LightGBM: 4.5.0
sklearn: 1.5.2

Command(s) you used to install LightGBM

pip install lightgbm

In summary, there are differences in the reported performance (ndcg) depending on which LGBMRanker functions you use. I believe this discrepancy is from differences in the predicted relevance scores generated in the LGBMRanker.fit() vs. LGBMRanker.predict() processes.

Additional Comments

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rank Evaluation Metric NDCG inconsistent between LGBMRanker.fit vs. LGBMRanker.predict() #6814

Rank Evaluation Metric NDCG inconsistent between LGBMRanker.fit vs. LGBMRanker.predict() #6814

cjsombric commented Feb 6, 2025