Skip to content

Commit e3e2421

Browse files
authored
Update README.md
1 parent e6b72d2 commit e3e2421

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

README.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -339,13 +339,13 @@ This means we want the model to attend to every pixel over the course of generat
339339

340340
### Early stopping with BLEU
341341

342-
To evaluate the model's performance on the validation set, we will use the automated evaluation metric: [BiLingual Evaluation Understudy (BLEU)](http://www.aclweb.org/anthology/P02-1040.pdf). This evaluates a generated caption against reference captions(s). For each generated caption, we will use all `N_c` captions available for that image as the reference captions.
342+
To evaluate the model's performance on the validation set, we will use the automated [BiLingual Evaluation Understudy (BLEU)](http://www.aclweb.org/anthology/P02-1040.pdf) evaluation metric. This evaluates a generated caption against reference caption(s). For each generated caption, we will use all `N_c` captions available for that image as the reference captions.
343343

344-
The authors of the _Show, Attend and Tell_ paper observe that correlation between the loss and the BLEU score breaks down after a point, so they recommend to stop training early on when the BLEU score begins to degrade, even if the loss is on decreasing trend.
344+
The authors of the _Show, Attend and Tell_ paper observe that correlation between the loss and the BLEU score breaks down after a point, so they recommend to stop training early on when the BLEU score begins to degrade, even if the loss continues to decrease.
345345

346346
I used the BLEU tool [available in the NLTK module](https://www.nltk.org/_modules/nltk/translate/bleu_score.html).
347347

348-
Note that there is considerable criticism of the BLEU score because it doesn't correlate well with human judgments. The authors also report the METEOR scores for this reason, but I haven't implemented this metric.
348+
Note that there is considerable criticism of the BLEU score because it doesn't always correlate well with human judgment. The authors also report the METEOR scores for this reason, but I haven't implemented this metric.
349349

350350
### Remarks
351351

@@ -482,8 +482,8 @@ With the release of PyTorch `0.4`, wrapping tensors as `Variable`s is no longer
482482
- When a tensor is created from or modified using another tensor that allows gradients, then `requires_grad` will be set to `True`.
483483
- Tensors which are parameters of `torch.nn` layers will already have `requires_grad` set to `True`.
484484

485+
---
485486

487+
__How do I compute all BLEU (i.e. BLEU-1 to BLEU-4) scores during evaluation?__
486488

487-
__How to compute all BLEU (i.e. BLEU-1 to BLEU-4) scores during evaluation?__
488-
489-
For this you've to adapt the [eval.py script](<https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/master/eval.py#L171>). Please see the answer by [kmario23](<https://github.com/kmario23>) on [How to calculate all bleu scores (issues #37)](<https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/issues/37#issuecomment-455924998>) for a clear explanation of how to do this.
489+
You'd need to modify the code in [eval.py](<https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/master/eval.py#L171>) to do this. Please see [this excellent answer](<https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/issues/37#issuecomment-455924998>) by [kmario23](<https://github.com/kmario23>) for a detailed explanation.

0 commit comments

Comments
 (0)