Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rank Evaluation Metric NDCG inconsistent between LGBMRanker.fit vs. LGBMRanker.predict() #6814

Open
cjsombric opened this issue Feb 6, 2025 · 0 comments

Comments

@cjsombric
Copy link

Description

LGBMRanker's eval metric 'ndcg' does not match (1) the NDCG I generate by hand or (2) the NDCG generated by sklearn.metrics.ndcg_score.
I believe this is caused by different predictions being generated in the "fit" and "predict" LGBMRanker processes.

Reproducible example

`
from lightgbm import LGBMRanker
import pandas as pd

unable to include training data:

X_train = []
y_train = []
train_groups = []

Will include a small sample validation set:

X_ex_small = [] # too big to include
relevance_data = [0, 1, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
y_ex_small = pd.Series(relevance_data)
y_ex_small.index = ['123'] * len(y_ex_small)

val_ex_small = pd.Series([20], index=["123"])

model_gdbt = LGBMRanker(objective="lambdarank",
boosting_type = "gbdt",
random_state=42,
max_depth=10,
min_data_in_leaf=200,
n_estimators=100,
subsample=0.5,
colsample_bytree=0.6,
lambda_l1=0.9,
lambda_l2=0.9,
n_jobs = -1)

model_gdbt.fit(X_train,
y_train,
group=train_groups,
eval_set=[(X_train, y_train), (X_ex_small, y_ex_small)],
eval_group=[train_groups, val_ex_small],
eval_metric=['ndcg'],
eval_at=[1, 5])

lightgbm's NDCG for the validation set (X_ex_small, y_ex_small)

lightgbm_ndcg5 = model_gdbt.evals_result_['valid']['ndcg@5'][-1] # returns 0.9596358141506394

sklearn NDCG for the validation set (X_ex_small, y_ex_small)

from sklearn.metrics import ndcg_score
y_ex_small_pred = model_gdbt.predict(X_ex_small) # returns [3.35057029, 3.46270607, 3.49636596, 3.13087368, 3.55823747, 3.28307494, 3.37924097, 2.84076022, 3.39463737, 3.51163148, 3.497507 , 3.29053029, 3.25150361, 3.12610702, 3.06495292, 2.70117685, 3.17200958, 3.52930069, 2.90185473, 3.51950751]
sklearn_ndcg5 = ndcg_score([y_ex_small.values], [y_ex_small_pred], k=5) # returns 0.8637574337885663

Manual NDCG for the validation set (X_ex_small, y_ex_small)

#Visulize the relevance scores (y_ex_small) nest to the predicted relevance scores (y_ex_small_pred) and sort by the predicted relevance
df_temp = y_ex_small.to_frame()
df_temp["pred"] = y_ex_small_pred
df_temp.sort_values(by="pred", ascending=False)

returns:

#relevance | pred
#-- | --
#4 | 3.558237
#0 | 3.529301
#0 | 3.519508
#0 | 3.511631
#0 | 3.497507
#0 | 3.496366
#1 | 3.462706
#...

Manual computation of ndcg@5 which matches sklearn, but not lightgbm

from math import log2
dcg5 = ( 4 / log2(1 + 1))
idcg5 = ( 4 / log2(1 + 1)) + ( 1 / log2(1 + 2))
manual_ndcg5 = dcg5/idcg5 # returns 0.8637574337885663

Thus manual_ndcg5==sklearn_ndcg5

BUT manual_ndcg5!=lightgbm_ndcg5

In order to replicate the lightgbm_ndcg5 I need to assume the predicted ranking had the relevance 1 document in the 4th position...

from math import log2
dcg5_2replicate_lightgbm_ndcg5 = ( 4 / log2(1 + 1)) + ( 1 / log2(1 + 4))
idcg5_2replicate_lightgbm_ndcg5 = ( 4 / log2(1 + 1)) + ( 1 / log2(1 + 2))
manual_ndcg5_2replicate_lightgbm_ndcg5 = dcg5_2replicate_lightgbm_ndcg5/idcg5_2replicate_lightgbm_ndcg5 # returns 0.9596358141506394

`

Environment info

LightGBM version or commit hash:
LightGBM: 4.5.0
sklearn: 1.5.2

Command(s) you used to install LightGBM

pip install lightgbm

In summary, there are differences in the reported performance (ndcg) depending on which LGBMRanker functions you use. I believe this discrepancy is from differences in the predicted relevance scores generated in the LGBMRanker.fit() vs. LGBMRanker.predict() processes.

Additional Comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant