Nel Report Precision, Recall, and F1 scores are unreplicable #68

BenLambright · 2024-08-16T21:54:50Z

Bug Description

When I run python evaluate.py preds@dbpedia-spotlight-wrapper@aapb-collaboration-21 golds, I am able to return the same counts for gold and system entities as the report, but not the same precision, accuracy, and recall. The scores for these are either 0 or near-zero numbers.

Reproduction steps

cd to nel_eval
remove guid cpb-aacip-507-nk3610wp6s from both the preds and golds because of its defunct gold data. An error will occur otherwise.
run python evaluate.py preds@dbpedia-spotlight-wrapper@aapb-collaboration-21 golds
view the results

Expected behavior

See the report for the expected behavior.

Log output

No response

Screenshots

No response

Additional context

I have tried different methods of comparing the gold and preds (hashing, strings, manually checking), and at least to me it appears that the criteria for calling the preds and golds NamedEntityLink classes must have changed in the current iteration of evaluate.py from when this report was written.

The text was updated successfully, but these errors were encountered:

BenLambright added the 🐛B Something isn't working label Aug 16, 2024

clams-bot added this to infra Aug 16, 2024

github-project-automation bot moved this to Todo in infra Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nel Report Precision, Recall, and F1 scores are unreplicable #68

Nel Report Precision, Recall, and F1 scores are unreplicable #68

BenLambright commented Aug 16, 2024

Nel Report Precision, Recall, and F1 scores are unreplicable #68

Nel Report Precision, Recall, and F1 scores are unreplicable #68

Comments

BenLambright commented Aug 16, 2024

Bug Description

Reproduction steps

Expected behavior

Log output

Screenshots

Additional context