You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to run the code smell detection experiment and I found the macro f1 score and the micro f1 score are both above 82%. But the paper report it with 71.2% for CodeBERT. Can the author help explain?
The text was updated successfully, but these errors were encountered:
Hi, that's quite a difference. I guess the other way round would be more problematic. We usually mean over several seeds, so with a good seed your result might be higher. Different hyper-parameters might also cause this. Did you use the same CodeBERT version, hyper-parameters, pre-processing, mean over 5 seeds?
Thank you very much for the reply. I used the default seeds (including 100, 200, 300, 400, 500) in the code. Since the experiment records results every 20 steps, I selected the best one for each fold of the experiment corresponding to each seed and averaged them at the end. For the model, I used microsoft/codebert-base other than huggingface/CodeBERTa-small-v1 to evaluate the capabilities of CodeBERT models. And I didn't change any other hyper-parameters. For data preprocessing, I followed the method of the paper and did not do any preprocessing, not even for code comments. May the code comment couse it? I don't know.
I tried to run the code smell detection experiment and I found the macro f1 score and the micro f1 score are both above 82%. But the paper report it with 71.2% for CodeBERT. Can the author help explain?
The text was updated successfully, but these errors were encountered: