-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathexample_problem_set.Rmd
201 lines (155 loc) · 7.3 KB
/
example_problem_set.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
title: "Problem Set / Data Exercise Example"
author: "Devin Judge-Lord"
date: \today
output: pdf_document
header-includes: ## Add any Latex packages you need (or use a preamble/template)
- \usepackage{setspace} ## spacing text
---
```{r setup, include=FALSE}
## Sets defaults for R chunks
knitr::opts_chunk$set(echo = TRUE, # echo = TRUE means that your code will show
warning=FALSE,
message=FALSE,
# fig.path='Figs/', ## where to save figures
fig.height = 3,
fig.width = 3,
fig.align = 'center')
## Add any R packages you require.
## Here are some we will use in 811:
requires <- c("tidyverse", # tidyverse includes dplyr and ggplot2
"magrittr",
"foreign",
"readstata13",
"here")
## Install any you don't have
to_install <- c(requires %in% rownames(installed.packages()) == FALSE)
install.packages(c(requires[to_install], "NA"), repos = "https://cloud.r-project.org/" )
## Load all required R packages
library(tidyverse)
library(ggplot2); theme_set(theme_bw())
library(magrittr)
library(here)
```
<!-- The above header sets everything up. -->
<!-- The below is just example content, edit/delete as needed. -->
<!-- NOTE: Just like LaTeX, Markdown is plain text. To use LaTex syntax: $LaTeX syntax$ -->
Imagine that you are provided a sample of data and asked to estimate the linear regression model $y_i = \alpha + \beta x_i + \epsilon_i$ (or, in equivilant notation, $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$).
Let us say that these data contain 20 observations for two variables:
`Leg_Act` $\in\{-20,40\}$ is the legislative activity of state assembly members, where -20 is no significant legislative activity and 40 is the maximum level of activity. This is the dependent variable, $Y$, with each observation being a $y_i$.
`terms` is the number of terms in office. This is your explanatory variable, X, with each observation being a $x_i$.
You have a number of tasks:
1. Plot the dependent variable against the explanatory variable.
2. Estimate the parameters $\alpha$ and $\beta$.
3. Compute the residuals (the difference between the observed values of the dependent variable and the predicted values from the estimated linear model (i.e. the distance of each observed $x_i$ from the regression line).
4. Plot the residuals against the explanatory variable.
5. Correlate the observed values of the dependent variable $Y$ (the vector of each $y_i$) with the predicted values $\hat{Y}$
6. Compare the square of this correlation (between the observed values of $Y$ and predicted $\hat{Y}$) to the model $R^2$.
7. Test the null hypothesis that $\beta = 0$ against an alternative that $\beta \neq 0$.
8. Write a paragraph (double-spaced) interpreting the parameters and explaining the results of your hypothesis test.
**But**, for whatever reason, you want to do your problem set in R. [R Markdown](http://rmarkdown.rstudio.com) offers an easy way to do this without cutting and pasting. If you accidentally regressed X on Y rather than Y on X, fix the model and **pow**, your plots and estimates cited in your discussion are instantly corrected.
- [Here is the RMarkdown template](https://github.com/judgelord/PS811/raw/master/example_problem_set.Rmd) that made this pdf. Save it as a .Rmd file.
- [Here is a pdf about writing in RMarkdown](https://github.com/judgelord/PS811/raw/master/example_notes.pdf)
**But** the data are in STATA!?! No problem. R can read .dta files.
In STATA, save the data generated by the `PS813_EX1` function with your seed:
```{}
net install PS813_EX1, from(https://faculty.polisci.wisc.edu/weimer/)
PS813_EX1 yourseed
save "EX1.dta"
```
Alternativly, run STATA in a chunk (R Markdown supports [many languages](https://bookdown.org/yihui/rmarkdown/language-engines.html)!). First [install Statamarkdown](https://www.ssc.wisc.edu/~hemken/Stataworkshops/Stata%20and%20R%20Markdown/InstallingStatamarkdown.html). Then, add a STATA setup chunk (just like our R setup chunk above) that allows STATA chunks: [Instructions here.](https://www.ssc.wisc.edu/~hemken/Stataworkshops/Stata%20and%20R%20Markdown/randstata.html)
Then load it into R with the `readstata13` package:
**Note: R is looking for "EX1.dta" in a folder called "data" whereever this .Rmd files is saved**
```{r data}
## Load your data, defining an R object called "d"
d <- readstata13::read.dta13(here("data/EX1.dta"))
glimpse(d)
```
```{r if_data_fail, echo=FALSE}
# empty data if your loading data failed
if(is.null(d)){
d <- data.frame("terms" = 0, "Leg_Act" = 0 )
print("data/EX1.dta NOT FOUND in a folder called data where this .Rmd files is saved")
}
```
Now on to the tasks:
\newpage
<!-- Obviously, delete the above before you turn this in -->
In STATA, generate data with the `PS813_EX1` function:
```{}
net install PS813_EX1, from(https://faculty.polisci.wisc.edu/weimer/)
PS813_EX1 yourseed
save "EX1.dta"
```
# 1. A plot of Legislative Activity by Terms in Office
```{r plot_variables}
## STATA: plot Leg_Act terms
## R:
ggplot(d, aes(y = Leg_Act, x = terms)) +
geom_point()
```
```{r correlation_variables}
## STATA: corr Leg_Act terms
## R:
corXY <- cor(d$Leg_Act, d$terms)
corXY
```
The correlation between Legislative Activity and Terms in Office is `r corXY`
# 2. Estimating linear regression
```{r regression}
## STATA: regress Leg_Act terms
## R:
model <- lm(d$Leg_Act ~ d$terms)
# summary(model)
alpha <- model$coefficients[1]
beta <- model$coefficients[2]
```
<!-- We can print R objects right in the text by typing "r object"" in grave accent ticks -->
Regression coefficients: $\alpha$ = `r alpha` and $\beta$ = `r beta`
# 3. Computing residuals
```{r residuals}
## STATA: predict p_Leg_Act
## R:
d$p_Leg_Act <- predict(model)
## STATA: generate resid = Leg_Act - p_Leg_Act
## R:
d$resid <- d$Leg_Act - d$p_Leg_Act
```
# 4. Plot of Residuals
```{r plot_residuals, fig.width = 5}
## STATA: plot resid terms
## R:
ggplot(d) +
aes(y = resid, x = terms) + # "aesthetics"
geom_point() + # a layer of points
## to show how risiduals are the distance between an observation and the regression line:
geom_hline(yintercept = 0) +
geom_col(alpha = .1, width = .1, position = "dodge") +
## + labels:
labs(title = "Residuals (Observed - Predicted Legislative Activity)",
x = "Terms in Office",
y = "Residuals")
```
# 5. $Cor(Y,\hat{Y})$
```{r correlation_observed_predicted}
## STATA: corr Leg_Act p_Leg_Act
## R:
correlation <- cor(d$Leg_Act, d$p_Leg_Act)
```
$Cor(Y,\hat{Y})$ = `r correlation`
# 6. $Cor(Y,\hat{Y})^2$ vs. $R^2$.
```{r correlation_vs_R2}
## STATA: generate r2 =r(rho)*r(rho)
## R:
r2 <- summary(model)$r.squared
```
$R^2$ = `r r2`
# 7. Hypothesis test
\Large
Lorem ipsum $\beta = 0$
Lorem ipsum $\beta \neq 0$
# 8. Discussion
<!-- If printing assignments, it is nice to use \large or \Large text -->
\Large
\doublespacing
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.