You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LGBMRanker's eval metric 'ndcg' does not match (1) the NDCG I generate by hand or (2) the NDCG generated by sklearn.metrics.ndcg_score.
I believe this is caused by different predictions being generated in the "fit" and "predict" LGBMRanker processes.
Reproducible example
`
from lightgbm import LGBMRanker
import pandas as pd
unable to include training data:
X_train = []
y_train = []
train_groups = []
Will include a small sample validation set:
X_ex_small = [] # too big to include
relevance_data = [0, 1, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
y_ex_small = pd.Series(relevance_data)
y_ex_small.index = ['123'] * len(y_ex_small)
Manual NDCG for the validation set (X_ex_small, y_ex_small)
#Visulize the relevance scores (y_ex_small) nest to the predicted relevance scores (y_ex_small_pred) and sort by the predicted relevance
df_temp = y_ex_small.to_frame()
df_temp["pred"] = y_ex_small_pred
df_temp.sort_values(by="pred", ascending=False)
LightGBM version or commit hash:
LightGBM: 4.5.0
sklearn: 1.5.2
Command(s) you used to install LightGBM
pip install lightgbm
In summary, there are differences in the reported performance (ndcg) depending on which LGBMRanker functions you use. I believe this discrepancy is from differences in the predicted relevance scores generated in the LGBMRanker.fit() vs. LGBMRanker.predict() processes.
Additional Comments
The text was updated successfully, but these errors were encountered:
Description
LGBMRanker's eval metric 'ndcg' does not match (1) the NDCG I generate by hand or (2) the NDCG generated by sklearn.metrics.ndcg_score.
I believe this is caused by different predictions being generated in the "fit" and "predict" LGBMRanker processes.
Reproducible example
`
from lightgbm import LGBMRanker
import pandas as pd
unable to include training data:
X_train = []
y_train = []
train_groups = []
Will include a small sample validation set:
X_ex_small = [] # too big to include
relevance_data = [0, 1, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
y_ex_small = pd.Series(relevance_data)
y_ex_small.index = ['123'] * len(y_ex_small)
val_ex_small = pd.Series([20], index=["123"])
model_gdbt = LGBMRanker(objective="lambdarank",
boosting_type = "gbdt",
random_state=42,
max_depth=10,
min_data_in_leaf=200,
n_estimators=100,
subsample=0.5,
colsample_bytree=0.6,
lambda_l1=0.9,
lambda_l2=0.9,
n_jobs = -1)
model_gdbt.fit(X_train,
y_train,
group=train_groups,
eval_set=[(X_train, y_train), (X_ex_small, y_ex_small)],
eval_group=[train_groups, val_ex_small],
eval_metric=['ndcg'],
eval_at=[1, 5])
lightgbm's NDCG for the validation set (X_ex_small, y_ex_small)
lightgbm_ndcg5 = model_gdbt.evals_result_['valid']['ndcg@5'][-1] # returns 0.9596358141506394
sklearn NDCG for the validation set (X_ex_small, y_ex_small)
from sklearn.metrics import ndcg_score
y_ex_small_pred = model_gdbt.predict(X_ex_small) # returns [3.35057029, 3.46270607, 3.49636596, 3.13087368, 3.55823747, 3.28307494, 3.37924097, 2.84076022, 3.39463737, 3.51163148, 3.497507 , 3.29053029, 3.25150361, 3.12610702, 3.06495292, 2.70117685, 3.17200958, 3.52930069, 2.90185473, 3.51950751]
sklearn_ndcg5 = ndcg_score([y_ex_small.values], [y_ex_small_pred], k=5) # returns 0.8637574337885663
Manual NDCG for the validation set (X_ex_small, y_ex_small)
#Visulize the relevance scores (y_ex_small) nest to the predicted relevance scores (y_ex_small_pred) and sort by the predicted relevance
df_temp = y_ex_small.to_frame()
df_temp["pred"] = y_ex_small_pred
df_temp.sort_values(by="pred", ascending=False)
returns:
#relevance | pred
#-- | --
#4 | 3.558237
#0 | 3.529301
#0 | 3.519508
#0 | 3.511631
#0 | 3.497507
#0 | 3.496366
#1 | 3.462706
#...
Manual computation of ndcg@5 which matches sklearn, but not lightgbm
from math import log2
dcg5 = ( 4 / log2(1 + 1))
idcg5 = ( 4 / log2(1 + 1)) + ( 1 / log2(1 + 2))
manual_ndcg5 = dcg5/idcg5 # returns 0.8637574337885663
Thus manual_ndcg5==sklearn_ndcg5
BUT manual_ndcg5!=lightgbm_ndcg5
In order to replicate the lightgbm_ndcg5 I need to assume the predicted ranking had the relevance 1 document in the 4th position...
from math import log2
dcg5_2replicate_lightgbm_ndcg5 = ( 4 / log2(1 + 1)) + ( 1 / log2(1 + 4))
idcg5_2replicate_lightgbm_ndcg5 = ( 4 / log2(1 + 1)) + ( 1 / log2(1 + 2))
manual_ndcg5_2replicate_lightgbm_ndcg5 = dcg5_2replicate_lightgbm_ndcg5/idcg5_2replicate_lightgbm_ndcg5 # returns 0.9596358141506394
`
Environment info
LightGBM version or commit hash:
LightGBM: 4.5.0
sklearn: 1.5.2
Command(s) you used to install LightGBM
In summary, there are differences in the reported performance (ndcg) depending on which LGBMRanker functions you use. I believe this discrepancy is from differences in the predicted relevance scores generated in the LGBMRanker.fit() vs. LGBMRanker.predict() processes.
Additional Comments
The text was updated successfully, but these errors were encountered: