-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathhw01.Rmd
168 lines (115 loc) · 5.6 KB
/
hw01.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
title: "GEOG606 - HW1"
author: "Anika Cartas"
date: "Saturday, February 14, 2015"
output: pdf_document
---
####1. Solve the following linear equations:
-x + 2y = 1
3x + y = 3
```{r}
left = matrix(c(-1,3,2,1,1,3),2,2); left
right = c(1,3); right
solve(left,right)
```
**If we add another equation, x + y = 2, is there a solution for the set of 3 equations?**
* No. We can only solve for a square matrix.
**If not, could you give a best estimate of x and y that satisfy the equations?**
* To find a best estimate, plot all three lines and find the centerpoint of the triangle their intersections create.
```{r}
#Plot all three lines
plot(NULL,xlim=c(0,2.5),ylim=c(0,3.5), xlab="x",ylab="y")
abline(1/2,1/2, col="red") #first first equation
abline(3,-3,col="green") #second equation
abline(2,-1,col="blue") #third equation
abline(h=1.1)
abline(v=0.73)
mtext("Rough Estimate: x=0.7,3 y=1.1")
```
####2. Calculate the covariance matrix of the following 3*2 data matrix:
First, calculate by hand:
$A = \left( \begin{array}{cc} 1 & 3 \\ 2 & 2 \\ 3 & 1 \end{array} \right)$
1. Calculate the matrix of devation scores, with n = nrows(A):
$$
a = A - 11\cdot A/{n} \\
a = \left( \begin{array}{cc} 1 & 3 \\ 2 & 2 \\ 3 & 1 \end{array} \right) - \left( \begin{array}{ccc} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array} \right)\left( \begin{array}{cc} 1 & 3 \\ 2 & 2 \\ 3 & 1 \end{array} \right) \cdot\frac{1}{3} \\
a = \left( \begin{array}{cc} 1 & 3 \\ 2 & 2 \\ 3 & 1 \end{array} \right) - \left(\begin{array}{cc} 6 & 6 \\ 6 & 6 \\ 6 & 6\end{array} \right)\cdot\frac{1}{3} \\
a = \left( \begin{array}{cc} 1 & 3 \\ 2 & 2 \\ 3 & 1 \end{array} \right) - \left(\begin{array}{cc} 2 & 2 \\ 2 & 2 \\ 2 & 2\end{array} \right) \\
a = \left( \begin{array}{cc} -1 & 1 \\ 0 & 0 \\ 1 & -1 \end{array}\right)
$$
2. Calculate the covariance matrix, with n = nrows(a):
$$
V = a^T\cdot a/n \\
V = \left( \begin{array}{ccc} -1 & 0 & 1 \\ 1 & 0 & -1 \end{array} \right) \cdot \left( \begin{array}{cc} -1 & 1 \\ 0 & 0 \\ 1 & -1 \end{array}\right) \cdot \frac{1}{2} \\
V = \left( \begin{array}{cc} 2 & -2 \\ -2 & 2\end{array} \right) \cdot \frac{1}{2} \\
V = \left( \begin{array}{cc} 1 & -1 \\ -1 & 1\end{array} \right)
$$
Now, calculate using R. So much easier!
```{r}
x = matrix(c(1,2,3,3,2,1),3,2); x
cov(x)
```
**What does the covariance tell you about the data?**
* The covariance matrix shows the variances of the two variables (x,y) along the diagonal, and the covariances between them in the other positions.
* In this case, the variance of both x and y is 1, while their covariance is -1, meaning that as x grows y shrinks, and vice versa.
####3.Use the following dataset to demonstrate the Central Tendency Theorem:
```{r}
data = c(3,2,3,4,3,2,5,4,5,2,3,3,10,4,5,4,12,15,10,17,15,18,16,14,9,10,11,18,17,18,16,17,18)
```
**3.1. Randomly select 5 values and calculate their mean. **
```{r}
s1 = sample(data,5); s1
mean(s1)
```
**Repeat 3 times, and calculate the mean of the sampling mean. Then, repeat 5, 10, 20, and 50 times.**
```{r}
#Calculate the mean of 'size' samples of 'sampleSize' from data, repeating for each size given in sizes. Return a formatted matrix of these means.
takeSamples <- function(data,sizes, sampleSize){
allMeans = matrix(NA,1,2)
for (n in sizes){
curMeans = vector()
for (i in c(1:n)){
curMeans = c(curMeans,mean(sample(data,sampleSize)))
}
allMeans = rbind(allMeans,c(n,mean(curMeans)))
}
allMeans = allMeans[-1,]
colnames(allMeans) = c("Times.Sampled","Mean")
allMeans #return value
}
```
Requested means from 3,5,10,20, and 50 repetitions of sampling of size 5.
```{r}
means.5 = takeSamples(data,c(3,5,10,20,50),sampleSize=5); means.5
```
**Use a plot to show how the mean of the sampling mean changes as the number of repeats increases.**
```{r}
#Out of curiosity, plotting larger sizes of samples (first, so that axes match up)
means.5.500 = takeSamples(data,seq(1,500,5),sampleSize=5)
plot(means.5.500, main="Means of Various Samples of 5 from Dataset", col="grey", xlab="Times Sampled", ylim=c(8,11))
#Request from assignment in red
points(means.5,col="red")
mtext("With a large enough number of samples, convergence is noticeable.")
```
**3.2. Repeat 3.1. but select 10 values during each random selection. Explain these plots in relation to the Central Tendency Theorem.**
```{r}
means.10.500 = takeSamples(data,seq(1,500,5),sampleSize=10)
plot(means.10.500, main="Means of Various Samples of 10 from Dataset", col="grey", xlab="Times Sampled", ylim=c(8,11))
#Request from assignment in red
means.10 = takeSamples(data,c(3,5,10,20,50),sampleSize=10)
points(means.10, col="red")
mtext("A similar convergence to the previous example is noticeable.")
```
**3.3. How does the sampling mean approach normal distribution in these experiments? Use histograms to show.**
```{r}
hist(means.5[,2], col=rgb(0,0,1,0.5), xlim=c(7,10),main="Distribution of Means of Means (Samples of 5)", xlab="Mean")
hist(means.10[,2], col=rgb(0,1,0,0.5), main="Distribution of Mean of Means (Samples of 10)", xlab="Mean", add=T)
legend("topleft", c("Sample(5)","Sample(10)"),col=c(rgb(0,0,1,0.5),rgb(0,1,0,0.5)), lwd=10)
box() #decorative box around data
mtext("With fewer samples, hard to note anything.")
hist(means.5.500[,2], col=rgb(0,0,1,0.5), main="Distribution of Means of Means (Samples of 5)", xlab="Mean")
hist(means.10.500[,2], col=rgb(0,1,0,0.5), main="Distribution of Mean of Means (Samples of 10)", xlab="Mean", add=T)
legend("topright", c("Sample(5)","Sample(10)"),col=c(rgb(0,0,1,0.5),rgb(0,1,0,0.5)), lwd=10)
box() #decorative box around data
mtext("With more samples, a normal curve with median 9.5 is noticeable.")
```