Skip to content

Commit 4092300

Browse files
committed
Merge branch 'master' of github.com:cs231n/cs231n.github.io
2 parents 8f57976 + d143162 commit 4092300

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

Diff for: linear-classify.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,7 @@ $$
277277

278278
can be interpreted as the (normalized) probability assigned to the correct label \\(y\_i\\) given the image \\(x\_i\\) and parameterized by \\(W\\). To see this, remember that the Softmax classifier interprets the scores inside the output vector \\(f\\) as the unnormalized log probabilities. Exponentiating these quantities therefore gives the (unnormalized) probabilities, and the division performs the normalization so that the probabilities sum to one. In the probabilistic interpretation, we are therefore minimizing the negative log likelihood of the correct class, which can be interpreted as performing *Maximum Likelihood Estimation* (MLE). A nice feature of this view is that we can now also interpret the regularization term \\(R(W)\\) in the full loss function as coming from a Gaussian prior over the weight matrix \\(W\\), where instead of MLE we are performing the *Maximum a posteriori* (MAP) estimation. We mention these interpretations to help your intuitions, but the full details of this derivation are beyond the scope of this class.
279279

280-
**Practical issues: Numeric stability**. When you're writing code for computing the Softmax function in pratice, the intermediate terms \\(e^{f\_{y\_i}}\\) and \\(\sum\_j e^{f\_j}\\) may be very large due to the exponentials. Dividing large numbers can be numerically unstable, so it is important to use a normalization trick. Notice that if we multiply the top and bottom of the fraction by a constant \\(C\\) and push it into the sum, we get the following (mathematically equivalent) expression:
280+
**Practical issues: Numeric stability**. When you're writing code for computing the Softmax function in practice, the intermediate terms \\(e^{f\_{y\_i}}\\) and \\(\sum\_j e^{f\_j}\\) may be very large due to the exponentials. Dividing large numbers can be numerically unstable, so it is important to use a normalization trick. Notice that if we multiply the top and bottom of the fraction by a constant \\(C\\) and push it into the sum, we get the following (mathematically equivalent) expression:
281281

282282
$$
283283
\frac{e^{f\_{y\_i}}}{\sum\_j e^{f\_j}}

0 commit comments

Comments
 (0)