Skip to content

Commit 1ef617b

Browse files
author
Leah Wasser
committed
fixing incorrect data
1 parent 677afe4 commit 1ef617b

File tree

3 files changed

+17
-19
lines changed

3 files changed

+17
-19
lines changed

_posts/courses/earth-analytics/02-time-series-data/get-to-know-r/2017-01-25-R05-missing-data-in-r.md

+17-19
Original file line numberDiff line numberDiff line change
@@ -89,10 +89,11 @@ Then you can open the data.
8989
boulder_precip <- read.csv(file = "data/boulder-precip.csv")
9090

9191
str(boulder_precip)
92-
## 'data.frame': 18 obs. of 3 variables:
93-
## $ X : int 756 757 758 759 760 761 762 763 764 765 ...
94-
## $ DATE : chr "2013-08-21" "2013-08-26" "2013-08-27" "2013-09-01" ...
92+
## 'data.frame': 18 obs. of 4 variables:
93+
## $ ID : int 756 757 758 759 760 761 762 763 764 765 ...
94+
## $ DATE : chr "8/21/13" "8/26/13" "8/27/13" "9/1/13" ...
9595
## $ PRECIP: num 0.1 0.1 0.1 0 0.1 1 2.3 9.8 1.9 1.4 ...
96+
## $ TEMP : int 55 25 NA -999 15 25 65 NA 95 -999 ...
9697
```
9798

9899
In the example below, note how a mean value is calculated differently depending
@@ -103,28 +104,26 @@ upon on how `NA` values are treated when the data are imported.
103104
```r
104105
# view mean values
105106
mean(boulder_precip$PRECIP)
106-
## [1] 1.055556
107+
## [1] 1.056
107108
mean(boulder_precip$TEMP)
108-
## Warning in mean.default(boulder_precip$TEMP): argument is not numeric or
109-
## logical: returning NA
110109
## [1] NA
111110
```
112111

113112
Notice that you are able to calculate a mean value for `PRECIP` but `TEMP` returns a
114113
`NA` value. Why? Let's plot your data to figure out what might be going on.
115114

116115

117-
118116
```r
119117
library(ggplot2)
120118
# are there values in the TEMP column of your data?
121119
boulder_precip$TEMP
122-
## NULL
120+
## [1] 55 25 NA -999 15 25 65 NA 95 -999 85 -999 85 85
121+
## [15] -999 57 60 65
123122
# plot the data with ggplot
124123
ggplot(data = boulder_precip, aes(x = DATE, y = TEMP)) +
125124
geom_point() +
126125
labs(title = "Temperature data for Boulder, CO")
127-
## Error in FUN(X[[i]], ...): object 'TEMP' not found
126+
## Warning: Removed 2 rows containing missing values (geom_point).
128127
```
129128

130129
<img src="{{ site.url }}/images/rfigs/courses/earth-analytics/02-time-series-data/get-to-know-r/2017-01-25-R05-missing-data-in-r/quick-plot-1.png" title="quick plot of temperature" alt="quick plot of temperature" width="90%" />
@@ -159,11 +158,9 @@ temperature column above.
159158
```r
160159
# calculate mean usign the na.rm argument
161160
mean(boulder_precip$PRECIP)
162-
## [1] 1.055556
161+
## [1] 1.056
163162
mean(boulder_precip$TEMP, na.rm = TRUE)
164-
## Warning in mean.default(boulder_precip$TEMP, na.rm = TRUE): argument is not
165-
## numeric or logical: returning NA
166-
## [1] NA
163+
## [1] -204.9
167164
```
168165

169166

@@ -173,7 +170,7 @@ examples.
173170
{: .notice--success}
174171

175172
So now you have successfully calculated the mean value of both precipitation and
176-
temperature in your spreadsheet. However does the mean temperature value (NA make
173+
temperature in your spreadsheet. However does the mean temperature value (-204.9375 make
177174
sense looking at the data? It seems a bit low - you know that there aren't temperature
178175
values of -200 here in Boulder, Colorado!
179176

@@ -185,8 +182,8 @@ is -999.
185182
```r
186183
# calculate mean usign the na.rm argument
187184
summary(boulder_precip$TEMP, na.rm = TRUE)
188-
## Length Class Mode
189-
## 0 NULL NULL
185+
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
186+
## -999 -238 56 -205 70 95 2
190187
```
191188

192189

@@ -213,7 +210,7 @@ This should solve all of your missing data problems!
213210
boulder_precip_na <- read.csv(file = "data/boulder-precip.csv",
214211
na.strings = c("NA", " ", "-999"))
215212
boulder_precip_na$TEMP
216-
## NULL
213+
## [1] 55 25 NA NA 15 25 65 NA 95 NA 85 NA 85 85 NA 57 60 65
217214
```
218215

219216
Does your new plot look better?
@@ -222,13 +219,14 @@ Does your new plot look better?
222219
```r
223220
# are there values in the TEMP column of your data?
224221
boulder_precip$TEMP
225-
## NULL
222+
## [1] 55 25 NA -999 15 25 65 NA 95 -999 85 -999 85 85
223+
## [15] -999 57 60 65
226224
# plot the data with ggplot
227225
ggplot(data = boulder_precip_na, aes(x = DATE, y = TEMP)) +
228226
geom_point() +
229227
labs(title = "Temperature data for Boulder, CO",
230228
subtitle = "missing data accounted for")
231-
## Error in FUN(X[[i]], ...): object 'TEMP' not found
229+
## Warning: Removed 6 rows containing missing values (geom_point).
232230
```
233231

234232
<img src="{{ site.url }}/images/rfigs/courses/earth-analytics/02-time-series-data/get-to-know-r/2017-01-25-R05-missing-data-in-r/plot-2nodata-1.png" title="Plot of temperature with missing data accounted for" alt="Plot of temperature with missing data accounted for" width="90%" />

0 commit comments

Comments
 (0)