Skip to content

Commit fafbcfc

Browse files
committed
fixed a few typos
1 parent b310095 commit fafbcfc

File tree

1 file changed

+77
-33
lines changed

1 file changed

+77
-33
lines changed

counting-and-MLEs/counting-and-MLEs.ipynb

+77-33
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@
3535
"### Some probability vocabulary:\n",
3636
"A **random variable**: \"A variable quantity whose value depends on possible outcomes.\" In this case, $T_1$ and $T_2$ are \"random variables\" \n",
3737
"An **event**: A measurable outcome of the experiment (i.e., \"$T_1 = t_1$,\" where $t_1$ is some specific number of Retweets in the sample) \n",
38-
"**Independent events**: Two (or more) events whose outcomes do not affect eachother. In this example, all of our events are pretty much independent of eachother.\n",
39-
"A **PMF** or **probabilty mass function**: For a discrete probabilty distribution (in this case, we're dealing with a discrete distribution, because we cannot collect a sample of 10.5 Tweets), the probability mass function is a function that gives the probability of each possible event, and the sum over all possible events is 1. \n",
38+
"**Independent events**: Two (or more) events whose outcomes do not affect eachother. In this example, all of our events are pretty much independent of eachother. \n",
39+
"A **PMF** or **probabilty mass function**: For a discrete probabilty distribution (in this case, we're dealing with a discrete distribution, because we cannot collect a sample of 10.5 Tweets), the probability mass function is a function that gives the probability of each possible event, and the sum over all possible events is 1. \n",
4040
"A **joint PMF**: A function that gives the probabilty of two events occurring together. For two independent events, we can just multiply the two individual PMFs together. Easy.\n",
4141
"\n",
4242
"\n",
@@ -110,7 +110,7 @@
110110
"source": [
111111
"# Set up the problem. We have to fix tau_1, tau_2, and s\n",
112112
"tau1 = 100\n",
113-
"tau2 = 90\n",
113+
"tau2 = 75\n",
114114
"s = .1\n",
115115
"\n",
116116
"# We can use the binomial PMF from scipy, which is better than coding it up myself \n",
@@ -144,7 +144,9 @@
144144
{
145145
"cell_type": "code",
146146
"execution_count": null,
147-
"metadata": {},
147+
"metadata": {
148+
"collapsed": false
149+
},
148150
"outputs": [],
149151
"source": [
150152
"plt.plot([x[0] for x in probabilities_t1],[x[1] for x in probabilities_t1])\n",
@@ -157,14 +159,16 @@
157159
"cell_type": "markdown",
158160
"metadata": {},
159161
"source": [
160-
"#### How likely is it that we get _exactly_ X% of $tau_1$? \n",
162+
"#### How likely is it that we get _exactly_ X% of $\\tau_1$? \n",
161163
"$$ P(t_1 = s\\tau_1) = {\\tau_1\\choose s\\tau_1} s^{t_1}(1-s)^{(\\tau_1 - s\\tau_1)}$$"
162164
]
163165
},
164166
{
165167
"cell_type": "code",
166168
"execution_count": null,
167-
"metadata": {},
169+
"metadata": {
170+
"collapsed": false
171+
},
168172
"outputs": [],
169173
"source": [
170174
"print(\"The probability of getting exactly s*tau_1 samples is: {:f}\".format(binomial_tau1.pmf(s*tau1)))"
@@ -181,16 +185,18 @@
181185
{
182186
"cell_type": "code",
183187
"execution_count": null,
184-
"metadata": {},
188+
"metadata": {
189+
"collapsed": false
190+
},
185191
"outputs": [],
186192
"source": [
187-
"plus_or_minus_percent = .01\n",
193+
"plus_or_minus_percent = .1\n",
188194
"lower_bound_t1 = int((1 - plus_or_minus_percent)*s*tau1)\n",
189195
"upper_bound_t1 = int((1 + plus_or_minus_percent)*s*tau1)\n",
190196
"probability_t1_interval = 0\n",
191197
"for i in range(lower_bound_t1,upper_bound_t1):\n",
192198
" probability_t1_interval += binomial_tau1.pmf(i)\n",
193-
"print(\"P({} < t_1 < {}) = {:f}\".format(lower_bound_t1,upper_bound_t1,probability_t1_interval))"
199+
"print(\"P({} <= t_1 < {}) = {:f}\".format(lower_bound_t1,upper_bound_t1,probability_t1_interval))"
194200
]
195201
},
196202
{
@@ -202,7 +208,7 @@
202208
"It's relatively simple to work out the standard deviation of the Binomial distribution, if we remember a few things about expected values (averages), standard deviation and variance: \n",
203209
"1. Expected value is a linear operator (i.e., $E[A + 2B] = E[A] + 2E[B]$) \n",
204210
"2. We'll call $E[X] = \\mu$ \n",
205-
"3. Variance: $Var[X] = E[(X - \\mu^2)] = E[X^2] - E[X]^2$\n",
211+
"3. Variance: $Var[X] = E[(X - \\mu)^2] = E[X^2] - E[X]^2$\n",
206212
"4. If two random variables are uncorrelated, Variance is a linear operator: $Var[\\sum_i X_i] = \\sum_i Var[X_i]$\n",
207213
"5. Standard deviation: $\\sqrt{Var[X]}$ \n",
208214
"\n",
@@ -221,7 +227,9 @@
221227
{
222228
"cell_type": "code",
223229
"execution_count": null,
224-
"metadata": {},
230+
"metadata": {
231+
"collapsed": false
232+
},
225233
"outputs": [],
226234
"source": [
227235
"# standard deviation varies as the size of tau1 varies\n",
@@ -250,7 +258,9 @@
250258
{
251259
"cell_type": "code",
252260
"execution_count": null,
253-
"metadata": {},
261+
"metadata": {
262+
"collapsed": false
263+
},
254264
"outputs": [],
255265
"source": [
256266
"# Let's show how likely it is to get exactly t_1 of T_1. Just for practice\n",
@@ -271,15 +281,17 @@
271281
{
272282
"cell_type": "code",
273283
"execution_count": null,
274-
"metadata": {},
284+
"metadata": {
285+
"collapsed": false
286+
},
275287
"outputs": [],
276288
"source": [
277289
"# the total probability of falling within 3 standard deviations of the mean (s*tau_1):\n",
278290
"plus_or_minus_stdv = .01\n",
279291
"lower_bound_t1 = max(0,int(s*tau1 - 3 * sqrt(tau1*s*(1-s))))\n",
280292
"upper_bound_t1 = int(s*tau1 + 3 * sqrt(tau1*s*(1-s)))\n",
281293
"probability_t1_interval = 0\n",
282-
"for i in range(lower_bound_t1,upper_bound_t1):\n",
294+
"for i in range(lower_bound_t1,upper_bound_t1+1):\n",
283295
" probability_t1_interval += binomial_tau1.pmf(i)\n",
284296
"print(\"P({} < t_1 < {}) = {:f} (very nearly 99.7%)\".format(lower_bound_t1,upper_bound_t1,probability_t1_interval))"
285297
]
@@ -294,7 +306,9 @@
294306
{
295307
"cell_type": "code",
296308
"execution_count": null,
297-
"metadata": {},
309+
"metadata": {
310+
"collapsed": false
311+
},
298312
"outputs": [],
299313
"source": [
300314
"# t1 and t2 most likely fall within some range\n",
@@ -352,7 +366,9 @@
352366
{
353367
"cell_type": "code",
354368
"execution_count": null,
355-
"metadata": {},
369+
"metadata": {
370+
"collapsed": false
371+
},
356372
"outputs": [],
357373
"source": [
358374
"figure, ax = plt.subplots(1,1,figsize = (6,12))\n",
@@ -369,7 +385,9 @@
369385
{
370386
"cell_type": "code",
371387
"execution_count": null,
372-
"metadata": {},
388+
"metadata": {
389+
"collapsed": false
390+
},
373391
"outputs": [],
374392
"source": [
375393
"# The total probabilties should sum to 1\n",
@@ -381,7 +399,9 @@
381399
{
382400
"cell_type": "code",
383401
"execution_count": null,
384-
"metadata": {},
402+
"metadata": {
403+
"collapsed": false
404+
},
385405
"outputs": [],
386406
"source": [
387407
"# Now, sum only the lower triangle--the piece where t_2 >= t_1\n",
@@ -421,7 +441,7 @@
421441
"Call each PMF $P(X_i = x_i; \\theta)$ (where $\"; \\theta\"$ denotes the value of the parameter that we're trying to estimate, which the PMF is dependent on) then the probability (likelihood) of a set of $n$ results is dependent on a independent variable $\\theta$:\n",
422442
"$$ L(\\theta) = P(X_1 = x_1, X_2 = x_2, X_3 = x_3, \\dots, X_n = x_n; \\theta) = \\prod_{i = 0}^n P(X_i= x_i; \\theta)$$\n",
423443
"\n",
424-
"Now, if we find the value of the parameter $\\theta$ that maximizes the likelihood of our results $x_i$ coming from the distribution, we have found the MAximum Likelihood Estimator for $\\theta$. Super.\n",
444+
"Now, if we find the value of the parameter $\\theta$ that maximizes the likelihood of our results $x_i$ coming from the distribution, we have found the Maximum Likelihood Estimator for $\\theta$. Super.\n",
425445
"\n",
426446
"#### 2. A simple example of MLE estimation\n",
427447
"\n",
@@ -458,7 +478,9 @@
458478
{
459479
"cell_type": "code",
460480
"execution_count": null,
461-
"metadata": {},
481+
"metadata": {
482+
"collapsed": true
483+
},
462484
"outputs": [],
463485
"source": [
464486
"# set up the experiment\n",
@@ -474,7 +496,9 @@
474496
{
475497
"cell_type": "code",
476498
"execution_count": null,
477-
"metadata": {},
499+
"metadata": {
500+
"collapsed": true
501+
},
478502
"outputs": [],
479503
"source": [
480504
"# get the likelihood results\n",
@@ -485,7 +509,9 @@
485509
{
486510
"cell_type": "code",
487511
"execution_count": null,
488-
"metadata": {},
512+
"metadata": {
513+
"collapsed": false
514+
},
489515
"outputs": [],
490516
"source": [
491517
"# plot\n",
@@ -525,7 +551,9 @@
525551
{
526552
"cell_type": "code",
527553
"execution_count": null,
528-
"metadata": {},
554+
"metadata": {
555+
"collapsed": true
556+
},
529557
"outputs": [],
530558
"source": [
531559
"# get the likelihood results\n",
@@ -536,7 +564,9 @@
536564
{
537565
"cell_type": "code",
538566
"execution_count": null,
539-
"metadata": {},
567+
"metadata": {
568+
"collapsed": false
569+
},
540570
"outputs": [],
541571
"source": [
542572
"# plot\n",
@@ -572,7 +602,9 @@
572602
{
573603
"cell_type": "code",
574604
"execution_count": null,
575-
"metadata": {},
605+
"metadata": {
606+
"collapsed": false
607+
},
576608
"outputs": [],
577609
"source": [
578610
"# let's see what we would estimate s to be analytically\n",
@@ -593,12 +625,14 @@
593625
{
594626
"cell_type": "code",
595627
"execution_count": null,
596-
"metadata": {},
628+
"metadata": {
629+
"collapsed": false
630+
},
597631
"outputs": [],
598632
"source": [
599633
"# set up the experiment\n",
600634
"# number of trials\n",
601-
"n = 1000\n",
635+
"n = 100\n",
602636
"# probabilty, s. i'll use the same \"s\" that we're using for the problem setup\n",
603637
"print(\"s = {}\".format(s))\n",
604638
"# now, fix a \"real\" tau\n",
@@ -616,7 +650,9 @@
616650
{
617651
"cell_type": "code",
618652
"execution_count": null,
619-
"metadata": {},
653+
"metadata": {
654+
"collapsed": true
655+
},
620656
"outputs": [],
621657
"source": [
622658
"# Now, evaluate the binomial distribution likelihood for this collection of results\n",
@@ -636,7 +672,9 @@
636672
{
637673
"cell_type": "code",
638674
"execution_count": null,
639-
"metadata": {},
675+
"metadata": {
676+
"collapsed": false
677+
},
640678
"outputs": [],
641679
"source": [
642680
"plt.plot([x[0] for x in likelihood_tau], [x[1] for x in likelihood_tau])\n",
@@ -667,7 +705,9 @@
667705
{
668706
"cell_type": "code",
669707
"execution_count": null,
670-
"metadata": {},
708+
"metadata": {
709+
"collapsed": false
710+
},
671711
"outputs": [],
672712
"source": [
673713
"%%time\n",
@@ -692,7 +732,7 @@
692732
" binomial_tau2 = binom(j_tau1_i_tau2[1],s)\n",
693733
" return (j_tau1_i_tau2, binomial_tau1.pmf(t1)*binomial_tau2.pmf(t2))\n",
694734
"# create the processes\n",
695-
"pool = Pool(processes=12)\n",
735+
"pool = Pool(processes=4)\n",
696736
"tau2_and_tau1_list = []\n",
697737
"probabilities_tau2_and_tau1_list = []\n",
698738
"for j_tau1 in range(window_est_tau1[0],window_est_tau1[1]):\n",
@@ -708,7 +748,9 @@
708748
{
709749
"cell_type": "code",
710750
"execution_count": null,
711-
"metadata": {},
751+
"metadata": {
752+
"collapsed": false
753+
},
712754
"outputs": [],
713755
"source": [
714756
"figure, ax = plt.subplots(1,1,figsize = (6,12))\n",
@@ -723,7 +765,9 @@
723765
{
724766
"cell_type": "code",
725767
"execution_count": null,
726-
"metadata": {},
768+
"metadata": {
769+
"collapsed": false
770+
},
727771
"outputs": [],
728772
"source": [
729773
"print(\"Estimate of the probabilty that tau_1 < tau_2 given that t_1 = {} and t_2 = {}: {}\".format(\n",

0 commit comments

Comments
 (0)