Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F1 Score for Structured/Beer on paper can't be reproduced #30

Open
junwei-h opened this issue Dec 16, 2022 · 0 comments
Open

F1 Score for Structured/Beer on paper can't be reproduced #30

junwei-h opened this issue Dec 16, 2022 · 0 comments

Comments

@junwei-h
Copy link

junwei-h commented Dec 16, 2022

Hello,
I run the code on Windows without GPU and turned off -fp16:
python.exe .\train_ditto.py '--task' 'Structured/Beer' '--batch_size' '32' '--max_len' '256' '--lr' '3e-5' '--n_epochs' '40' '--lm' 'roberta' '--da' 'del' '--dk' 'product' '--summarize'

RoBERTa is from https://huggingface.co/roberta-base

The paper says "We use the base uncased variant of each model in all our experiments". RoBERTa is case-sensitive.
So which uncased variant of RoBERTa was used?
Also which uncased variant of XLNet was used?

The paper reported 94.34%, the best I got is
epoch 14: dev_f1=0.896551724137931, f1=0.8666666666666666, best_f1=0.9032258064516129
Any suggestions on what may have caused this low performance?

Thank you.

Here is the output
step: 0, loss: 0.5871710777282715
epoch 1: dev_f1=0.37931034482758624, f1=0.36666666666666664, best_f1=0.36666666666666664
step: 0, loss: 0.2969485819339752
epoch 2: dev_f1=0.2745098039215686, f1=0.2692307692307693, best_f1=0.36666666666666664
step: 0, loss: 0.2463674694299698
epoch 3: dev_f1=0.32558139534883723, f1=0.32499999999999996, best_f1=0.36666666666666664
step: 0, loss: 0.5062930583953857
epoch 4: dev_f1=0.32558139534883723, f1=0.32499999999999996, best_f1=0.36666666666666664
step: 0, loss: 0.2536587119102478
epoch 5: dev_f1=0.4117647058823529, f1=0.36923076923076925, best_f1=0.36923076923076925
step: 0, loss: 0.3347562551498413
epoch 6: dev_f1=0.6923076923076924, f1=0.6470588235294117, best_f1=0.6470588235294117
step: 0, loss: 0.3830795884132385
epoch 7: dev_f1=0.8275862068965518, f1=0.6666666666666665, best_f1=0.6666666666666665
step: 0, loss: 0.27009156346321106
epoch 8: dev_f1=0.8387096774193549, f1=0.9333333333333333, best_f1=0.9333333333333333
step: 0, loss: 0.13321542739868164
epoch 9: dev_f1=0.8666666666666666, f1=0.9032258064516129, best_f1=0.9032258064516129
step: 0, loss: 0.024025270715355873
epoch 10: dev_f1=0.8666666666666666, f1=0.9032258064516129, best_f1=0.9032258064516129
step: 0, loss: 0.0391874834895134
epoch 11: dev_f1=0.896551724137931, f1=0.9032258064516129, best_f1=0.9032258064516129
step: 0, loss: 0.00302126444876194
epoch 12: dev_f1=0.8387096774193549, f1=0.9032258064516129, best_f1=0.9032258064516129
step: 0, loss: 0.06331554800271988
epoch 13: dev_f1=0.8666666666666666, f1=0.9032258064516129, best_f1=0.9032258064516129
step: 0, loss: 0.026920529082417488
epoch 14: dev_f1=0.896551724137931, f1=0.8666666666666666, best_f1=0.9032258064516129
step: 0, loss: 0.023745562881231308
epoch 15: dev_f1=0.8666666666666666, f1=0.9032258064516129, best_f1=0.9032258064516129
step: 0, loss: 0.012241823598742485
epoch 16: dev_f1=0.8666666666666666, f1=0.9032258064516129, best_f1=0.9032258064516129
step: 0, loss: 0.0017187324119731784
epoch 17: dev_f1=0.8666666666666666, f1=0.9032258064516129, best_f1=0.9032258064516129
step: 0, loss: 0.0006802910938858986
epoch 18: dev_f1=0.8484848484848484, f1=0.8484848484848484, best_f1=0.9032258064516129
step: 0, loss: 0.0009096315479837358
epoch 19: dev_f1=0.8387096774193549, f1=0.9032258064516129, best_f1=0.9032258064516129
step: 0, loss: 0.0005167351919226348
epoch 20: dev_f1=0.8387096774193549, f1=0.9032258064516129, best_f1=0.9032258064516129
step: 0, loss: 0.0003216741606593132
epoch 21: dev_f1=0.8387096774193549, f1=0.9032258064516129, best_f1=0.9032258064516129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant