Skip to content

Commit 0e41b87

Browse files
committed
adding p-values
1 parent 5471e94 commit 0e41b87

File tree

1 file changed

+42
-0
lines changed

1 file changed

+42
-0
lines changed

lectures/inference/inference.Rmd

+42
Original file line numberDiff line numberDiff line change
@@ -652,6 +652,48 @@ plot(N, polls$observations)
652652
abline(0,1)
653653
```
654654

655+
## p-values
656+
657+
p-values are ubiquotous in the scientific literature. They are related to confidence interval so we introduce the concept here.
658+
659+
Let's consider the blue and red beads. Suppose that rather than wanting an estimate of the percent of blue beads I am more interested in the question are ther more blue beads or red beads.
660+
661+
Suppose we take a random sample of $N=100$ and we observe 53 blue beads. This seems to be pointing to their being more blue than red. However, as data scientists we need to be skeptical. We know there is chance involved in this process and we could get a 53 even when the proportions of red and blue are the same. We call this a _null hypothesis_. The null hypothesis is the skeptics hypothesis: the proportion of blue beads $p$ is 0.5. We have observed a random variable $\hat{p} = 0.53$ and the p-value is the answer to the question how likely is it to see a value this large, when the null hypothesis is true. So we write
662+
663+
$$\mbox{Pr}(\mid \hat{p} - 0.5 \mid > 0.03 ) $$
664+
665+
assuming the $p=0.5$. Under the null we know that
666+
667+
$$
668+
\sqrt{N}\frac{\hat{p} - 0.5}{\sqrt{0.5(1-0.5)}}
669+
$$
670+
671+
is standard normal. So we can compute the probability above, which is the p-value.
672+
673+
$$\mbox{Pr}(\sqrt{N}\frac{\mid \hat{p} - 0.5\mid}{\sqrt{0.5(1-0.5)}}> \sqrt{N} 0.03/ \sqrt{0.5(1-0.5)) $$
674+
675+
676+
```{r}
677+
N=100
678+
z <- sqrt(N)*0.03/0.5
679+
1 - (pnorm(0.6) - pnorm(-0.6))
680+
```
681+
682+
So we do in fact have reason to be a skeptics. By constructing a p-value we see that
683+
684+
685+
Assessment:
686+
Later we see an $\hat{p}=53.0347$ that was obtained with an $N=4397$. What is the p-value?
687+
688+
689+
690+
691+
692+
693+
694+
695+
696+
655697
#### Setting the random seed
656698

657699
Before we continue, we briefly explain the following important line of

0 commit comments

Comments
 (0)