about CodeBERT/CodeBERTa metric #1

little-pikachu · 2022-06-22T01:25:19Z

I tried to run the code smell detection experiment and I found the macro f1 score and the micro f1 score are both above 82%. But the paper report it with 71.2% for CodeBERT. Can the author help explain？

furunkel · 2022-06-22T06:26:56Z

Hi, that's quite a difference. I guess the other way round would be more problematic. We usually mean over several seeds, so with a good seed your result might be higher. Different hyper-parameters might also cause this. Did you use the same CodeBERT version, hyper-parameters, pre-processing, mean over 5 seeds?

little-pikachu · 2022-06-22T09:28:03Z

Thank you very much for the reply. I used the default seeds (including 100, 200, 300, 400, 500) in the code. Since the experiment records results every 20 steps, I selected the best one for each fold of the experiment corresponding to each seed and averaged them at the end. For the model, I used microsoft/codebert-base other than huggingface/CodeBERTa-small-v1 to evaluate the capabilities of CodeBERT models. And I didn't change any other hyper-parameters. For data preprocessing, I followed the method of the paper and did not do any preprocessing, not even for code comments. May the code comment couse it? I don't know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about CodeBERT/CodeBERTa metric #1

about CodeBERT/CodeBERTa metric #1

little-pikachu commented Jun 22, 2022

furunkel commented Jun 22, 2022

little-pikachu commented Jun 22, 2022

about CodeBERT/CodeBERTa metric #1

about CodeBERT/CodeBERTa metric #1

Comments

little-pikachu commented Jun 22, 2022

furunkel commented Jun 22, 2022

little-pikachu commented Jun 22, 2022