Start ch 2

daob · daob · commit 5a53afe2575f · 2016-06-19T15:48:26.000+02:00
diff --git a/README.md b/README.md
@@ -3,32 +3,5 @@
 <a href=https://www.datacamp.com/teach/repositories/52621483/go target="_blank"><img src="https://s3.amazonaws.com/assets.datacamp.com/img/github/content-engineering-repos/course_button.png" width="150"></a>
 <a href=http://www.datacamp.com/teach/repositories target="_blank"><img src="https://s3.amazonaws.com/assets.datacamp.com/img/github/content-engineering-repos/dashboard_button.png" width="150"></a>
 
-These are the four <a href=https://www.datacamp.com target="_blank">DataCamp</a> assignments to go with SURV730 in the JPSM online/IPSDS program. Please see <a href="http://jpsmonline.umd.edu/course/view.php?id=57">JPSM online</a> for more information, slides, videos, and the syllabus.
+These are the three <a href=https://www.datacamp.com target="_blank">DataCamp</a> assignments to go with SURV730 in the JPSM online/IPSDS program. Please see <a href="http://jpsmonline.umd.edu/course/view.php?id=57">JPSM online</a> for more information, slides, videos, and the syllabus.
 
-
-## Workflow
-
-1. Edit the markdown and yml files in this repository. You can use GitHub's online editor or use <a href=https://git-scm.com/ target="_blank">git</a> locally and push your changes.
-2. Check out your build attempts on the <a href=http://www.datacamp.com/teach/repositories target="_blank">Teach Dashboard</a>.
-3. Check out your automatically updated <a href=https://www.datacamp.com/teach/repositories/52621483/go target="_blank">course on DataCamp</a>
-
-## Getting Started
-
-A DataCamp course consists of two types of files:
-
-- `course.yml`, a <a href=http://docs.ansible.com/ansible/YAMLSyntax.html target="_blank">YAML-formatted file</a> that's prepopulated with some general course information.
-- `chapterX.Rmd`, a markdown file with:
-   - a YAML header containing chapter information. 
-   - markdown chunks representing DataCamp Exercises. 
-
-To learn more about the structure of a DataCamp course, check out the <a href=http://www.datacamp.com/teach/documentation#tab_course_structure target="_blank">documentation</a>.
-
-Every DataCamp exercise consists of different parts, read up about them <a href=http://www.datacamp.com/teach/documentation#tab_code_exercises target="_blank">here</a>. A very important part about DataCamp exercises is to provide automated personalized feedback to students. In R, these so-called Submission Correctness Tests (SCTs) are written with the <a href=https://github.com/datacamp/testwhat target="_blank">`testwhat`</a> package. SCTs for Python exercises are coded up with <a href=https://github.com/datacamp/pythonwhat target="_blank">`pythonwhat`</a>. Check out the GitHub repositories' wiki pages for more information and examples.
-
-## Want to learn more?
-- Check out the <a href=http://www.datacamp.com/teach/documentation target="_blank">DataCamp Teach documentation</a>.
-- Check out DataCamp's blog posts:
-  - <a href=https://www.datacamp.com/community/blog/create-your-own-r-tutorials-with-github-datacamp target="_blank">Create a course with DataCamp Teach</a> 
-  - <a href=https://www.datacamp.com/community/blog/create-your-own-r-tutorials-with-github-datacamp target="_blank">Interpreting DataCamp Teach's build attempts</a>.
-
-*Happy teaching!*
diff --git a/chapter2.md b/chapter2.md
@@ -0,0 +1,123 @@
+---
+title_meta  : Unit 2
+title       : Estimating measurement error in continuous variables
+description : "Evaluating measurement error without a gold standard"
+attachments :
+  slides_link : http://jpsmonline.umd.edu/pluginfile.php/4810/mod_folder/content/0/SURV730-Unit-1-slides-2016-summer.pdf?forcedownload=1
+
+--- type:NormalExercise lang:r xp:50 skills:7  key:39a225af5a
+## Association measures for categorical variables
+
+
+* Yule's Q is explained on the Wikipedia page about "Goodman & Krusal's gamma" <https://en.wikipedia.org/wiki/Goodman_and_Kruskal%27s_gamma#Yule.27s_Q>
+* The phi coefficient has its own wikipedia page: <https://en.wikipedia.org/wiki/Phi_coefficient>
+* John Uebersax's page has a good overview of various association measures and their interrelatedness: <http://www.john-uebersax.com/stat/agree.htm>
+
+*** =instructions 
+- Look up the `phi` (&phi;) and `Yule` (Yule's Q) association measures online;
+- Verify that the base code to the right gives the same estimates as are given for `tab_marijuana` in the slides;
+- Adjust the code to the right to estimate association between `gndr` and `dshltgp` using the `tab_gender_GP` table;
+- Verify that you get the same results as in the slides.
+
+*** =hint
+- Replace `tab_marijuana` with `tab_gender_GP` in the code to the right;
+- In the call to `glm`, which uses Poisson regression to get at the log-odds ratio coefficient from a table, you may need to adjust the variable names `Question 1` and `Question 2` to correspond to the names in the `tab_gender_GP` table.
+
+*** =solution
+```{r}
+# These are all from psych library:
+tab_gender_GP %>% cohen.kappa
+tab_gender_GP %>% tetrachoric
+tab_gender_GP %>% phi
+tab_gender_GP %>% Yule
+
+# Different ways of getting the log-odds ratio from a table
+
+# Calculate it direclty using vcd package
+tab_gender_GP %>% vcd::loddsratio(.)
+
+# Calculate it by hand
+my_log_odds_ratio <- function(tab) {
+  odds_ratio <- (tab[1, 1] * tab[2, 2]) / (tab[1, 2] * tab[2, 1])
+  log(odds_ratio)
+}
+tab_gender_GP %>% my_log_odds_ratio
+
+# Using a Poisson model for the counts 
+#    (useful when you want to use svyglm from the survey package
+#      to get design-based complex sampling standard errors)
+tab_gender_GP %>% as.data.frame %>% 
+  glm(Freq ~ gndr * dshltgp, data = ., family = poisson) %>%
+  summary
+
+# As a loglinear model 
+#     (useful when you want to test margins, can also use the survey
+#      package's loglinear function to adjust for complex sampling)
+# Note that the logodds ratio is 4*the loglinear interaction param.
+tab_gender_GP %>% loglin(margin = list(1:2), param = TRUE)
+
+```
+
+*** =sample_code
+```{r}
+library(psych)
+
+# These are all from psych library:
+tab_marijuana %>% cohen.kappa
+tab_marijuana %>% tetrachoric
+tab_marijuana %>% phi
+tab_marijuana %>% Yule
+
+# Different ways of getting the log-odds ratio from a table
+
+# Calculate it direclty using vcd package
+tab_marijuana %>% vcd::loddsratio(.)
+
+# Calculate it by hand
+my_log_odds_ratio <- function(tab) {
+  odds_ratio <- (tab[1, 1] * tab[2, 2]) / (tab[1, 2] * tab[2, 1])
+  log(odds_ratio)
+}
+tab_marijuana %>% my_log_odds_ratio
+
+# Using a Poisson model for the counts 
+#    (useful when you want to use svyglm from the survey package
+#      to get design-based complex sampling standard errors)
+tab_marijuana %>% as.data.frame %>% 
+  glm(Freq ~ Question.1 * Question.2, data = ., family = poisson) %>%
+  summary
+
+# As a loglinear model 
+#     (useful when you want to test margins, can also use the survey
+#      package's loglinear function to adjust for complex sampling)
+# Note that the logodds ratio is 4*the loglinear interaction param.
+tab_marijuana %>% loglin(margin = list(1:2), param = TRUE)
+
+
+```
+
+*** =pre_exercise_code
+```{r}
+library(psych)
+library(vcd)
+library(dplyr)
+
+load(url("http://daob.nl/files/SURV730/tab_marijuana.rdata"))
+
+load(url("http://daob.nl/files/SURV730/table_gender_GP.rdata"))
+
+options(digits = 4)
+
+```
+
+*** =sct
+
+```{r}
+test_function("cohen.kappa", args = "object",
+              not_called_msg = "You didn't call `cohen.kappa()`!",
+              incorrect_msg = "You didn't call `cohen.kappa(object = ...)` with the correct argument, `object`.")
+
+test_error()
+
+```
+