You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-6
Original file line number
Diff line number
Diff line change
@@ -339,13 +339,13 @@ This means we want the model to attend to every pixel over the course of generat
339
339
340
340
### Early stopping with BLEU
341
341
342
-
To evaluate the model's performance on the validation set, we will use the automated evaluation metric: [BiLingual Evaluation Understudy (BLEU)](http://www.aclweb.org/anthology/P02-1040.pdf). This evaluates a generated caption against reference captions(s). For each generated caption, we will use all `N_c` captions available for that image as the reference captions.
342
+
To evaluate the model's performance on the validation set, we will use the automated [BiLingual Evaluation Understudy (BLEU)](http://www.aclweb.org/anthology/P02-1040.pdf) evaluation metric. This evaluates a generated caption against reference caption(s). For each generated caption, we will use all `N_c` captions available for that image as the reference captions.
343
343
344
-
The authors of the _Show, Attend and Tell_ paper observe that correlation between the loss and the BLEU score breaks down after a point, so they recommend to stop training early on when the BLEU score begins to degrade, even if the loss is on decreasing trend.
344
+
The authors of the _Show, Attend and Tell_ paper observe that correlation between the loss and the BLEU score breaks down after a point, so they recommend to stop training early on when the BLEU score begins to degrade, even if the loss continues to decrease.
345
345
346
346
I used the BLEU tool [available in the NLTK module](https://www.nltk.org/_modules/nltk/translate/bleu_score.html).
347
347
348
-
Note that there is considerable criticism of the BLEU score because it doesn't correlate well with human judgments. The authors also report the METEOR scores for this reason, but I haven't implemented this metric.
348
+
Note that there is considerable criticism of the BLEU score because it doesn't always correlate well with human judgment. The authors also report the METEOR scores for this reason, but I haven't implemented this metric.
349
349
350
350
### Remarks
351
351
@@ -482,8 +482,8 @@ With the release of PyTorch `0.4`, wrapping tensors as `Variable`s is no longer
482
482
- When a tensor is created from or modified using another tensor that allows gradients, then `requires_grad` will be set to `True`.
483
483
- Tensors which are parameters of `torch.nn` layers will already have `requires_grad` set to `True`.
484
484
485
+
---
485
486
487
+
__How do I compute all BLEU (i.e. BLEU-1 to BLEU-4) scores during evaluation?__
486
488
487
-
__How to compute all BLEU (i.e. BLEU-1 to BLEU-4) scores during evaluation?__
488
-
489
-
For this you've to adapt the [eval.py script](<https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/master/eval.py#L171>). Please see the answer by [kmario23](<https://github.com/kmario23>) on [How to calculate all bleu scores (issues #37)](<https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/issues/37#issuecomment-455924998>) for a clear explanation of how to do this.
489
+
You'd need to modify the code in [eval.py](<https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/master/eval.py#L171>) to do this. Please see [this excellent answer](<https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/issues/37#issuecomment-455924998>) by [kmario23](<https://github.com/kmario23>) for a detailed explanation.
0 commit comments