-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpractical_dplyr.Rmd
152 lines (102 loc) · 3.36 KB
/
practical_dplyr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: "My answers"
author: "My name"
date: "2021-10-22"
output: html_document
---
```{r setup, include = FALSE}
library(tidyverse)
```
### Judgment - part one
This short tutorial will allow you to explore `dplyr` functionality based on the previous lecture. Every question can be answered with a combination of `%>%` pipes. You should refrain from using temporary variables and statements outside of the range of the tidyverse.
The first part does not require joins or pivots.
##### Import the [data from the website](https://biostat2.uni.lu/practicals/data/judgments.tsv).
Assign to the name `judgments`
```{r}
# Write your answer here
```
##### Use `glimpse()` to identify columns and column types.
##### Select all columns that refer to the STAI questionnaire
```{r}
# Write your answer here
```
##### Select all subjects older than 25
```{r}
# Write your answer here
```
##### Retrieve all subjects younger than 20 which are in the stress group
The column for the group is `condition`.
```{r}
# Write your answer here
```
##### Abbreviate the gender column such that only the first character remains
```{r}
# Write your answer here
```
##### Normalize the values in the REI group
Divide all entries in the REI questionnaire by 5, the maximal value.
```{r}
# Write your answer here
```
##### Find the moral dilemma with the highest average score across all participants.
The result should be a tibble containing the dilemma and the average such that the dilemma with the highest average in the first row.
```{r}
# Write your answer here
```
##### Find the moral dilemma with the highest average score across all participants and also compute the median.
```{r}
# Write your answer here
```
##### Find the number of subjects by age, gender and condition
The table should include how many 20 years of age females are in the `stress` group.
Sort the resulting tibble such that the condition that contains the most populous group is sorted first.
```{r}
# Write your answer here
```
## Part two
### Genetic variants
##### Clean the table of genetic *variants* such that all variants appear as a column labeled by their position.
The format in the `var` columns is the
* reference allele (one of the DNA "letters" `A`, `C`, `G` or `T`),
* the position along this particular gene
* and the variant allele.
E.g. in `T6G`, `T` is the reference allele, `6` is the position (along the gene) and `G` is the variant allele.
```{r}
variants <- tribble(
~sampleid, ~var1, ~var2, ~var3,
"S1", "A3T", "T5G", "T6G",
"S2", "A3G", "T5G", NA,
"S3", "A3T", "T6C", "G10C",
"S4", "A3T", "T6C", "G10C"
)
```
Your result should look something like the following table.
sampleid | 3| 5| 6
:--------|-:|-:|-:
S1 | T| G| G
S2 | G| G| `NA`
```{r}
# Write your answer here
```
#### Select relevant variants
Genetic variants are labeled according to their effect on stability of the gene product.
##### Select the subjects in table *variants* that carry variants labeled as *damaging*.
The final output should be vector of sample ids.
```{r}
variant_significance <- tribble(
~variant, ~significance,
"A3T", "unknown",
"A3G", "damaging",
"T5G", "benign",
"T6G", "damaging",
"T6C", "benign",
"G10C", "unknown"
)
```
```{r}
# Write your answer here
```
##### Try using semi-join to achieve the same result. {.bonus}
```{r}
# Write your answer here
```