Skip to content

Commit 7d33154

Browse files
authored
Update README.md
1 parent 1431f90 commit 7d33154

File tree

1 file changed

+4
-6
lines changed

1 file changed

+4
-6
lines changed

README.md

+4-6
Original file line numberDiff line numberDiff line change
@@ -125,20 +125,18 @@ Instead of the simple average, we use the _weighted_ average across all pixels,
125125

126126
### Attention
127127

128-
Intuitively, what would you need to estimate the importance of different parts of the image?
128+
Intuitively, how would you estimate the importance of a certain part of an image?
129129

130-
You would need to know how much of the sequence you have generated, so you could look at the image and decide what needs describing next. For example, you know that you have mentioned `a man` so far, but you look at the image and notice the aforementioned man is `holding` `a` `football`.
130+
You would need to be aware of the sequence you have generated _so far_, so you can look at the image and decide what needs describing next. For example, after you mention `a man`, the logical thing to do is to declare that he is `holding` `a` `football`.
131131

132-
This is exactly what the attention mechanism does - it considers the sequence generated thus far, looks at the image, and _attends_ to the part of it that needs describing next.
132+
This is exactly what the Attention mechanism does - it considers the sequence generated thus far, and _attends_ to the part of the image that needs describing next.
133133

134134
![Attention](./img/att.png)
135135
<p align="center">
136136
*Attention*
137137
</p>
138138

139-
We will use the _soft_ Attention, where the weights of the pixels add up to 1. You could interpret this as finding the probability that a certain pixel is _the_ important part of the image to generate the next word.
140-
141-
(Funny story - when I was a kid growing up in India doing drills at school, the PE teacher would
139+
We will use _soft_ Attention, where the weights of the pixels add up to 1. You could interpret this as computing the probability that a certain pixel is _the_ important part of the image to generate the next word.
142140

143141
### Putting it all together
144142

0 commit comments

Comments
 (0)