Skip to content

Commit 2d6ed41

Browse files
committed
fix: paths
1 parent 11afd12 commit 2d6ed41

File tree

2 files changed

+18
-18
lines changed

2 files changed

+18
-18
lines changed

README.md

+13-13
Original file line numberDiff line numberDiff line change
@@ -93,14 +93,14 @@ Example plot of this data: https://s13.gifyu.com/images/SCGH2.gif (code here: ht
9393

9494
Example visualization: live demo here - https://jaanli.github.io/american-community-survey/ (visualization code [here](https://github.com/jaanli/american-community-survey/))
9595

96-
![image](https://github.com/jaanli/exploring_american_community_survey_data/assets/5317244/0428e121-c4ec-4a97-826f-d3f944bc7bf2)
96+
![image](https://github.com/jaanli/exploring_data_processing_data/assets/5317244/0428e121-c4ec-4a97-826f-d3f944bc7bf2)
9797

9898
## Requirements
9999

100100
Clone the repo; create and activate a virtual environment:
101101
```
102-
git clone https://github.com/jaanli/exploring_american_community_survey_data.git
103-
cd exploring_american_community_survey_data
102+
git clone https://github.com/jaanli/american-community-survey.git
103+
cd american-community-survey
104104
python3 -m venv .venv
105105
source activate
106106
```
@@ -124,7 +124,7 @@ brew install duckdb
124124

125125
To retrieve the list of URLs from the Census Bureau's server and download and extract the archives for all of the 50 states' PUMS files, run the following:
126126
```
127-
cd american_community_survey
127+
cd data_processing
128128
dbt run --exclude "public_use_microdata_sample.generated+" --vars '{"public_use_microdata_sample_url": "https://www2.census.gov/programs-surveys/acs/data/pums/2022/1-Year/", "public_use_microdata_sample_data_dictionary_url": "https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMS_Data_Dictionary_2022.csv", "output_path": "~/data/american_community_survey"}'
129129
```
130130

@@ -144,7 +144,7 @@ dbt run --select "public_use_microdata_sample.generated+" --vars '{"public_use_m
144144
Inspect the output folder to see what has been created in the `output_path` specified in the previous command:
145145
```
146146
❯ tree -hF -I '*.pdf' ~/data/american_community_survey
147-
[ 224] /Users/me/data/american_community_survey/
147+
[ 224] /Users/me/data/data_processing/
148148
├── [ 128] 2022/
149149
│ └── [3.4K] 1-Year/
150150
│ ├── [ 128] csv_hak/
@@ -169,7 +169,7 @@ To see the size of the csv output:
169169

170170
```
171171
❯ du -sh ~/data/american_community_survey/2022
172-
6.4G /Users/me/data/american_community_survey/2022
172+
6.4G /Users/me/data/data_processing/2022
173173
```
174174

175175
And the compressed representation size:
@@ -278,12 +278,12 @@ Check that you can execute a SQL query against these files:
278278
```
279279
duckdb -c "SELECT COUNT(*) FROM '~/data/american_community_survey/*individual_people_united_states*2021.parquet'"
280280
```
281-
6. Create a data visualization using the compressed parquet files by adding to the `american_community_survey/models/public_use_microdata_sample/figures` directory, and using examples from here https://github.com/jaanli/american-community-survey/ or here https://github.com/jaanli/lonboard/blob/example-american-community-survey/examples/american-community-survey.ipynb
281+
6. Create a data visualization using the compressed parquet files by adding to the `data_processing/models/public_use_microdata_sample/figures` directory, and using examples from here https://github.com/jaanli/american-community-survey/ or here https://github.com/jaanli/lonboard/blob/example-american-community-survey/examples/american-community-survey.ipynb
282282

283-
To save time, there is a bash script with these steps in `scripts/process_one_year_of_american_community_survey_data.sh` that can be used as follows:
283+
To save time, there is a bash script with these steps in `scripts/process_one_year_of_data_processing_data.sh` that can be used as follows:
284284
```
285-
chmod a+x scripts/process_one_year_of_american_community_survey_data.sh
286-
./scripts/process_one_year_of_american_community_survey_data.sh 2021
285+
chmod a+x scripts/process_one_year_of_data_processing_data.sh
286+
./scripts/process_one_year_of_data_processing_data.sh 2021
287287
```
288288

289289
The argument specifies the year to be downloaded, transformed, compressed, and saved. It takes about 5 minutes per year of data.
@@ -564,7 +564,7 @@ dbt run --select "public_use_microdata_sample.microdata_area_shapefile_paths"
564564
```
565565
5. Check that the paths are correct:
566566
```
567-
❯ duckdb -c "SELECT * FROM '/Users/me/data/american_community_survey/microdata_area_shapefile_paths.parquet';"
567+
❯ duckdb -c "SELECT * FROM '/Users/me/data/data_processing/microdata_area_shapefile_paths.parquet';"
568568
```
569569
Displays:
570570

@@ -573,11 +573,11 @@ Displays:
573573
│ shp_path │
574574
│ varchar │
575575
├─────────────────────────────────────────────────────────────────────────────────────────────┤
576-
│ /Users/me/data/american_community_survey/PUMA5/2010/tl_2010_02_puma10/tl_2010_02_puma10.shp │
576+
│ /Users/me/data/data_processing/PUMA5/2010/tl_2010_02_puma10/tl_2010_02_puma10.shp │
577577
│ · │
578578
│ · │
579579
│ · │
580-
│ /Users/me/data/american_community_survey/PUMA5/2010/tl_2010_48_puma10/tl_2010_48_puma10.shp │
580+
│ /Users/me/data/data_processing/PUMA5/2010/tl_2010_48_puma10/tl_2010_48_puma10.shp │
581581
├─────────────────────────────────────────────────────────────────────────────────────────────┤
582582
│ 54 rows (40 shown) │
583583
└─────────────────────────────────────────────────────────────────────────────────────────────┘

data_processing/dbt_project.yml

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# Name your project! Project names should contain only lowercase characters
22
# and underscores. A good package name should reflect your organization's
33
# name or the intended use of these models
4-
name: "american_community_survey"
4+
name: "data_processing"
55
version: "1.0.0"
66
config-version: 2
77

88
# This setting configures which "profile" dbt uses for this project.
9-
profile: "american_community_survey"
9+
profile: "data_processing"
1010

1111
# Variables that can be changed from the command line using the `--vars` flag:
1212
# example: dbt run --vars 'my_variable: my_value'
@@ -28,8 +28,8 @@ macro-paths: ["macros"]
2828
snapshot-paths: ["snapshots"]
2929

3030
clean-targets: # directories to be removed by `dbt clean`
31-
- "target"
32-
- "dbt_packages"
31+
- "target"
32+
- "dbt_packages"
3333

3434
# Configuring models
3535
# Full documentation: https://docs.getdbt.com/docs/configuring-models
@@ -38,7 +38,7 @@ clean-targets: # directories to be removed by `dbt clean`
3838
# directory as views. These settings can be overridden in the individual model
3939
# files using the `{{ config(...) }}` macro.
4040
models:
41-
american_community_survey:
41+
data_processing:
4242
# Config indicated by + and applies to all files under models/example/
4343
# example:
4444
+materialized: view

0 commit comments

Comments
 (0)