-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathRegression_to_mean.qmd
71 lines (50 loc) · 2.32 KB
/
Regression_to_mean.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: "Regression to the mean in pre-post-designs"
---
Assumption: In a pre-post-treatment-control design, I would assume that the regression weight from Pre -> Post is 1:
$$BDI_{post} = b_0 + 1*BDI_{pre} + e$$
$$BDI_{post} = b_0 + b_1*BDI_{pre} + b_2*treatment + e$$
Hence, the pre value simply is carried forward to the post value.
We add random error, as participants of course go somewhat up and down, but there is no systematic trend.
```{r}
library(Rfast)
n <- 10000
pre_post_cor <- 0.9
mu <- c(23, 23)
sigma <- matrix(c(117, pre_post_cor*sqrt(117)*sqrt(117), pre_post_cor*sqrt(117)*sqrt(117), 117), nrow=2, byrow=TRUE)
BDI <- rmvnorm(n, mu, sigma) |> data.frame()
xi <- rnorm(n, mean=23, sd=sqrt(117))
pre <- 1*xi + rnorm(n, mean=0, sd=2)
post <- 1*xi + rnorm(n, mean=0, sd=2)
cor(pre, post)
colMeans(BDI)
var(BDI)
names(BDI) <- c("pre", "post")
BDI$id <- 1:nrow(BDI)
summary(lm(post~pre, BDI))
library(tidyr)
library(ggplot2)
BDI_long <- pivot_longer(BDI, c(pre, post), names_to = "time")
ggplot(BDI_long, aes(x=time, y=value, group=id)) + geom_line()
# typically RTM pattern: The pre-post difference is
BDI$diff <- BDI$post-BDI$pre
BDI$absdiff <- abs(BDI$post-BDI$pre)
hist(BDI$diff)
mean(BDI$diff)
summary(lm(diff~pre, BDI))
ggplot(BDI, aes(x=pre, y=absdiff)) + geom_point() + geom_smooth()
ggplot(BDI, aes(x=pre, y=post)) + geom_point() + geom_smooth()
BDI$predicted <- predict(lm(post~pre, BDI))
mean(BDI$predicted)
var(BDI$predicted)
# The predicted values have the same mean (so no systematic treatment effect, as expected)
# but much smaller variance:
BDI_long2 <- pivot_longer(BDI, c(pre, post, predicted), names_to = "cat")
library(ggplot2)
ggplot(BDI_long2, aes(x=as.factor(cat), y=value)) +
ggdist::stat_halfeye(adjust = .5, width = .3, .width = 0, justification = -.3, point_colour = NA) +
geom_boxplot(width = .1, outlier.shape = NA) +
gghalves::geom_half_point(side = "l", range_scale = .4, alpha = .5)
```
Assumed that we use the predicted scores at t2 (post) and predict the next scores at t3 -- will it shrink ever more, until all data points are at the mean?
But this is not substantively reflected in the raw scores. Is it an artifact of regression?