Merge branch 'master' of github.com:cs231n/cs231n.github.io

karpathy · karpathy · commit 409230033098 · 2016-01-13T23:29:37.000-08:00
diff --git a/linear-classify.md b/linear-classify.md
@@ -277,7 +277,7 @@ $$
 
 can be interpreted as the (normalized) probability assigned to the correct label \\(y\_i\\) given the image \\(x\_i\\) and parameterized by \\(W\\). To see this, remember that the Softmax classifier interprets the scores inside the output vector \\(f\\) as the unnormalized log probabilities. Exponentiating these quantities therefore gives the (unnormalized) probabilities, and the division performs the normalization so that the probabilities sum to one. In the probabilistic interpretation, we are therefore minimizing the negative log likelihood of the correct class, which can be interpreted as performing *Maximum Likelihood Estimation* (MLE). A nice feature of this view is that we can now also interpret the regularization term \\(R(W)\\) in the full loss function as coming from a Gaussian prior over the weight matrix \\(W\\), where instead of MLE we are performing the *Maximum a posteriori* (MAP) estimation. We mention these interpretations to help your intuitions, but the full details of this derivation are beyond the scope of this class.
 
-**Practical issues: Numeric stability**. When you're writing code for computing the Softmax function in pratice, the intermediate terms \\(e^{f\_{y\_i}}\\) and \\(\sum\_j e^{f\_j}\\) may be very large due to the exponentials. Dividing large numbers can be numerically unstable, so it is important to use a normalization trick. Notice that if we multiply the top and bottom of the fraction by a constant \\(C\\) and push it into the sum, we get the following (mathematically equivalent) expression:
+**Practical issues: Numeric stability**. When you're writing code for computing the Softmax function in practice, the intermediate terms \\(e^{f\_{y\_i}}\\) and \\(\sum\_j e^{f\_j}\\) may be very large due to the exponentials. Dividing large numbers can be numerically unstable, so it is important to use a normalization trick. Notice that if we multiply the top and bottom of the fraction by a constant \\(C\\) and push it into the sum, we get the following (mathematically equivalent) expression:
 
 $$
 \frac{e^{f\_{y\_i}}}{\sum\_j e^{f\_j}}