fixed a few typos

fionapigott · fionapigott · commit fafbcfc6de4c · 2017-05-22T10:49:10.000-06:00
diff --git a/counting-and-MLEs/counting-and-MLEs.ipynb b/counting-and-MLEs/counting-and-MLEs.ipynb
@@ -35,8 +35,8 @@
     "### Some probability vocabulary:\n",
     "A **random variable**: \"A variable quantity whose value depends on possible outcomes.\" In this case, $T_1$ and $T_2$ are \"random variables\"  \n",
     "An **event**: A measurable outcome of the experiment (i.e., \"$T_1 = t_1$,\" where $t_1$ is some specific number of Retweets in the sample)  \n",
-    "**Independent events**: Two (or more) events whose outcomes do not affect eachother. In this example, all of our events are pretty much independent of eachother.\n",
-    "A **PMF** or **probabilty mass function**: For a discrete probabilty distribution (in this case, we're dealing with a discrete distribution, because we cannot collect a sample of 10.5 Tweets), the probability mass function is a function that gives the probability of each possible event, and the sum over all possible events is 1. \n",
+    "**Independent events**: Two (or more) events whose outcomes do not affect eachother. In this example, all of our events are pretty much independent of eachother.  \n",
+    "A **PMF** or **probabilty mass function**: For a discrete probabilty distribution (in this case, we're dealing with a discrete distribution, because we cannot collect a sample of 10.5 Tweets), the probability mass function is a function that gives the probability of each possible event, and the sum over all possible events is 1.   \n",
     "A **joint PMF**: A function that gives the probabilty of two events occurring together. For two independent events, we can just multiply the two individual PMFs together. Easy.\n",
     "\n",
     "\n",
@@ -110,7 +110,7 @@
    "source": [
     "# Set up the problem. We have to fix tau_1, tau_2, and s\n",
     "tau1 = 100\n",
-    "tau2 = 90\n",
+    "tau2 = 75\n",
     "s = .1\n",
     "\n",
     "# We can use the binomial PMF from scipy, which is better than coding it up myself \n",
@@ -144,7 +144,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "plt.plot([x[0] for x in probabilities_t1],[x[1] for x in probabilities_t1])\n",
@@ -157,14 +159,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### How likely is it that we get _exactly_ X% of $tau_1$? \n",
+    "#### How likely is it that we get _exactly_ X% of $\\tau_1$? \n",
     "$$ P(t_1 = s\\tau_1)  = {\\tau_1\\choose s\\tau_1} s^{t_1}(1-s)^{(\\tau_1 - s\\tau_1)}$$"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "print(\"The probability of getting exactly s*tau_1 samples is: {:f}\".format(binomial_tau1.pmf(s*tau1)))"
@@ -181,16 +185,18 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
-    "plus_or_minus_percent = .01\n",
+    "plus_or_minus_percent = .1\n",
     "lower_bound_t1 = int((1 - plus_or_minus_percent)*s*tau1)\n",
     "upper_bound_t1 = int((1 + plus_or_minus_percent)*s*tau1)\n",
     "probability_t1_interval = 0\n",
     "for i in range(lower_bound_t1,upper_bound_t1):\n",
     "    probability_t1_interval += binomial_tau1.pmf(i)\n",
-    "print(\"P({} < t_1 < {}) = {:f}\".format(lower_bound_t1,upper_bound_t1,probability_t1_interval))"
+    "print(\"P({} <= t_1 < {}) = {:f}\".format(lower_bound_t1,upper_bound_t1,probability_t1_interval))"
    ]
   },
   {
@@ -202,7 +208,7 @@
     "It's relatively simple to work out the standard deviation of the Binomial distribution, if we remember a few things about expected values (averages), standard deviation and variance:  \n",
     "1. Expected value is a linear operator (i.e., $E[A + 2B] = E[A] + 2E[B]$)  \n",
     "2. We'll call $E[X] = \\mu$ \n",
-    "3. Variance: $Var[X] = E[(X - \\mu^2)] = E[X^2] - E[X]^2$\n",
+    "3. Variance: $Var[X] = E[(X - \\mu)^2] = E[X^2] - E[X]^2$\n",
     "4. If two random variables are uncorrelated, Variance is a linear operator: $Var[\\sum_i X_i] = \\sum_i Var[X_i]$\n",
     "5. Standard deviation: $\\sqrt{Var[X]}$  \n",
     "\n",
@@ -221,7 +227,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "# standard deviation varies as the size of tau1 varies\n",
@@ -250,7 +258,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "# Let's show how likely it is to get exactly t_1 of T_1. Just for practice\n",
@@ -271,15 +281,17 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "# the total probability of falling within 3 standard deviations of the mean (s*tau_1):\n",
     "plus_or_minus_stdv = .01\n",
     "lower_bound_t1 = max(0,int(s*tau1 - 3 * sqrt(tau1*s*(1-s))))\n",
     "upper_bound_t1 = int(s*tau1 + 3 * sqrt(tau1*s*(1-s)))\n",
     "probability_t1_interval = 0\n",
-    "for i in range(lower_bound_t1,upper_bound_t1):\n",
+    "for i in range(lower_bound_t1,upper_bound_t1+1):\n",
     "    probability_t1_interval += binomial_tau1.pmf(i)\n",
     "print(\"P({} < t_1 < {}) = {:f} (very nearly 99.7%)\".format(lower_bound_t1,upper_bound_t1,probability_t1_interval))"
    ]
@@ -294,7 +306,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "# t1 and t2 most likely fall within some range\n",
@@ -352,7 +366,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "figure, ax = plt.subplots(1,1,figsize = (6,12))\n",
@@ -369,7 +385,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "# The total probabilties should sum to 1\n",
@@ -381,7 +399,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "# Now, sum only the lower triangle--the piece where t_2 >= t_1\n",
@@ -421,7 +441,7 @@
     "Call each PMF $P(X_i = x_i; \\theta)$ (where $\"; \\theta\"$ denotes the value of the parameter that we're trying to estimate, which the PMF is dependent on) then the probability (likelihood) of a set of $n$ results is dependent on a independent variable $\\theta$:\n",
     "$$ L(\\theta) = P(X_1 = x_1, X_2 = x_2, X_3 = x_3, \\dots, X_n = x_n; \\theta) = \\prod_{i = 0}^n P(X_i= x_i; \\theta)$$\n",
     "\n",
-    "Now, if we find the value of the parameter $\\theta$ that maximizes the likelihood of our results $x_i$ coming from the distribution, we have found the MAximum Likelihood Estimator for $\\theta$. Super.\n",
+    "Now, if we find the value of the parameter $\\theta$ that maximizes the likelihood of our results $x_i$ coming from the distribution, we have found the Maximum Likelihood Estimator for $\\theta$. Super.\n",
     "\n",
     "#### 2. A simple example of MLE estimation\n",
     "\n",
@@ -458,7 +478,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "# set up the experiment\n",
@@ -474,7 +496,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "# get the likelihood results\n",
@@ -485,7 +509,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "# plot\n",
@@ -525,7 +551,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "# get the likelihood results\n",
@@ -536,7 +564,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "# plot\n",
@@ -572,7 +602,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "# let's see what we would estimate s to be analytically\n",
@@ -593,12 +625,14 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "# set up the experiment\n",
     "# number of trials\n",
-    "n = 1000\n",
+    "n = 100\n",
     "# probabilty, s. i'll use the same \"s\" that we're using for the problem setup\n",
     "print(\"s = {}\".format(s))\n",
     "# now, fix a \"real\" tau\n",
@@ -616,7 +650,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "# Now, evaluate the binomial distribution likelihood for this collection of results\n",
@@ -636,7 +672,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "plt.plot([x[0] for x in likelihood_tau], [x[1] for x in likelihood_tau])\n",
@@ -667,7 +705,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "%%time\n",
@@ -692,7 +732,7 @@
     "    binomial_tau2 = binom(j_tau1_i_tau2[1],s)\n",
     "    return (j_tau1_i_tau2, binomial_tau1.pmf(t1)*binomial_tau2.pmf(t2))\n",
     "# create the processes\n",
-    "pool = Pool(processes=12)\n",
+    "pool = Pool(processes=4)\n",
     "tau2_and_tau1_list = []\n",
     "probabilities_tau2_and_tau1_list = []\n",
     "for j_tau1 in range(window_est_tau1[0],window_est_tau1[1]):\n",
@@ -708,7 +748,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "figure, ax = plt.subplots(1,1,figsize = (6,12))\n",
@@ -723,7 +765,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
    "outputs": [],
    "source": [
     "print(\"Estimate of the probabilty that tau_1 < tau_2 given that t_1 = {} and t_2 = {}: {}\".format(\n",