Skip to content

Commit 8a2b0c0

Browse files
authored
Merge pull request #689 from jhudsl/datasets-info-added
Added details about where the datasets come from
2 parents 991de72 + ed82c1f commit 8a2b0c0

File tree

3 files changed

+135
-0
lines changed

3 files changed

+135
-0
lines changed

materials_schedule.Rmd

+7
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,10 @@ pander::pandoc.table(
4040
)
4141
```
4242

43+
<br>
44+
45+
## Data
46+
47+
You can see an overview of the datasets used [here](https://jhudatascience.org/intro_to_r/resources/Datasets.html).
48+
49+

resources/Datasets.Rmd

+127
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
---
2+
title: "Datasets Used in This Course"
3+
output: html_document
4+
---
5+
6+
```{r, echo = FALSE}
7+
library(knitr)
8+
library(readr)
9+
opts_chunk$set(comment = "")
10+
```
11+
12+
The following are datasets used in this course.
13+
14+
## Annual Dosage
15+
16+
Number of shipments (count) of either oxycodone or hydrocodone pills (DOSAGE_UNIT).
17+
18+
* Source: https://www.washingtonpost.com/graphics/2019/investigations/dea-pain-pill-database/
19+
* URL: https://jhudatascience.org/intro_to_r/data/annualDosage.csv"
20+
* Modules: Data Subsetting
21+
22+
## Bike Lanes
23+
24+
Existing bike facilities throughout the City of Baltimore, as recognized by the Bike Baltimore program of the Baltimore City Department of Transportation.
25+
26+
* Source: Modified from https://data.baltimorecity.gov/datasets/baltimore::dot-bmc-bike-facilities/about (has been updated at the source compared to our version)
27+
* URL: http://jhudatascience.org/intro_to_r/data/Bike_Lanes.csv
28+
* Modules: Data Summarization, Data Cleaning, Data Visualization
29+
30+
## Car Auctions
31+
32+
Kaggle Dataset on Car Auctions.
33+
34+
* Source: https://www.kaggle.com/datasets/tunguz/used-car-auction-prices
35+
* URL: http://jhudatascience.org/intro_to_r/data/kaggleCarAuction.csv
36+
* Module: Statistics, Functions
37+
38+
## Charm City Circulator
39+
40+
This dataset describes ridership on the Baltimore Circulator, a free bus system.
41+
42+
* Source: https://data.baltimorecity.gov (no longer available)
43+
* URL: https://jhudatascience.org/intro_to_r/data/circulator_long.csv
44+
* Modules: RStudio, Data Classes, Manipulating Data, Statistics
45+
46+
## Child Mortality
47+
48+
* Source: Original source unclear. Possibly https://www.gapminder.org/data/documentation/gd005/
49+
* URL: https://jhudatascience.org/intro_to_r/data/mortality.csv
50+
* Module: Statistics, Functions
51+
52+
## Colorado Heat Wave ER Visits
53+
54+
This dataset contains information about the number and rate of visits for heat-related illness to Emergency rooms in Colorado from 2011-2022, adjusted for age.
55+
56+
* Source: https://coepht.colorado.gov/heat-related-illness
57+
* URL: https://jhudatascience.org/intro_to_r/data/CO_ER_heat_visits.csv
58+
* Modules: Esquisse Data Visualization
59+
60+
## County Pop
61+
62+
Modified data from US Census population by county.
63+
64+
* Source: https://www.census.gov/library/publications/2011/compendia/usa-counties-2011.html#POP
65+
* URL: https://jhudatascience.org/intro_to_r/data/county_pop.csv
66+
* Modules: Data Subsetting
67+
68+
## Dropouts
69+
70+
Data on student dropouts from the State of California during the 2016-2017 school year.
71+
72+
* Source: https://www.cde.ca.gov/ds/ad/filesdropouts.asp
73+
* URL: http://jhudatascience.org/intro_to_r/data/dropouts.txt
74+
* Modules: Factors
75+
76+
## Income by State
77+
78+
Personal income, in current dollars, increased in 49 states and the District of Columbia, with the percent change ranging from 5.4 percent at an annual rate in Arkansas to –0.7 percent in North Dakota. Early 2022 release.
79+
80+
* Source: Bureau of Economic Analysis (https://www.bea.gov/data/income-saving/personal-income-by-state)
81+
* URL: http://jhudatascience.org/intro_to_r/data/gdp_personal_income.csv
82+
* Modules: Manipulating Data
83+
84+
## mtcars
85+
86+
Classic dataset in R. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973--74 models)
87+
88+
* Source: https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/mtcars
89+
* URL: none
90+
* Modules: Data Subsetting, Data Summarization
91+
92+
## Orange
93+
94+
Dataset in R. The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees.
95+
96+
* Source:
97+
* URL: none
98+
* Modules: Data Visualization
99+
100+
## Tuberculosis Incidence
101+
102+
* Source: Original source unclear. Likely modified from https://www.who.int/data/gho/data/indicators/indicator-details/GHO/incidence-of-tuberculosis-(per-100-000-population-per-year) (note that data has been updated at the source since this data was downloaded)
103+
* URL: https://jhudatascience.org/intro_to_r/data/tb.csv
104+
* Modules: Data Summarization
105+
106+
## Vaccinations
107+
108+
Overall US COVID-19 Vaccine deliveries and administration data at national and jurisdiction level. Data represents all vaccine partners including jurisdictional partner clinics, retail pharmacies, long-term care facilities, dialysis centers, Federal Emergency Management Agency and Health Resources and Services Administration partner sites, and federal entity facilities.
109+
110+
* Source: https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-Jurisdi/unsk-b7fc
111+
* URL: https://jhudatascience.org/intro_to_r/data/vaccinations.csv
112+
* Modules: Data Input, Data Output
113+
114+
Similar data formatted a bit differently.
115+
116+
* Source: https://covid.cdc.gov/covid-data-tracker/#vaccinations_vacc-total-admin-rate-total - snapshot from January 12, 2022 (since taken down)
117+
* URL: http://jhudatascience.org/intro_to_r/data/USA_covid19_vaccinations.csv
118+
* Modules: Manipulating Data, Functions
119+
120+
## Youth Tobacco Survey
121+
122+
YTS was developed to provide states with comprehensive data on both middle school and high school students regarding tobacco use, exposure to environmental tobacco smoke, smoking cessation, school curriculum, minors' ability to purchase or otherwise obtain tobacco products, knowledge and attitudes about tobacco, and familiarity with pro-tobacco and anti-tobacco media messages.
123+
124+
* Source: https://catalog.data.gov/dataset/youth-tobacco-survey-yts-data
125+
* URL: http://jhudatascience.org/intro_to_r/data/Youth_Tobacco_Survey_YTS_Data.csv
126+
* Modules: Data Input, Data Summarization, Factors, Data Output
127+

resources/dictionary.txt

+1
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ ctrl
4040
codesmall
4141
CoursePlus
4242
CoV
43+
COVID
4344
cran
4445
csavone
4546
css

0 commit comments

Comments
 (0)