Exploring Predictive Policing in San Diego for DSC180B Capstone Project
Here is the link to the website.
Link to the GIS map can be found at https://arcg.is/1CmX0r
To replicate the entire (or subsets of the) project, copy and paste python run.py in the command line while in the root directory followed by the arguments below:
data: Ingests raw data from online sources.process: Runs the pipeline for cleaning and formatting raw datasets.eda: Performs exploratory data analysis and outputs visualizations.analyze: Performs statistical tests on differences in observed proportions between PredPol and non-PredPol instances.test-project: Runs the entire pipeline from start to end on a smaller, versioned test data.
For example, running the code below would reproduce the entire project:
python run.py data process eda analyze
The project consists of these portions:
PROJECT
├── config
├── data-params.json
├── process-params.json
├── eda-params.json
├── analyze-params.json
├── test-data-params.json
├── test-process-params.json
├── test-eda-params.json
├── test-analyze-params.json
└── env.json
├── data
├── raw
└── cleaned
├── notebooks
└── .gitkeep
├── references
├── arrest_charges.json
├── arrest_types.json
├── crime_charges.json
├── crime_types.json
├── divisions_mapper.json
├── nhgis0005_ds172_2010_block_codebook.txt
└── races.json
└── src
├── etl.py
├── eda.py
├── analyze.py
└── geospatial.py
├── test_data
├── raw
└── cleaned
├── viz
├── EDA
├── Arrests
├── Crime
└── Stops
└── Analysis
├── Arrests
├── Crime
└── Stops
├── .gitignore
├── Dockerfile
├── README.md
├── requirements.txt
├── run.py
data-params.json: Common parameters for getting data, serving as inputs to library code.process-params.json: Parameters for processing data.eda-params.json: Parameters for exploratory analysis on each dataset.analyze-params.json: Parameters for statistical testings and analyses.env.json: Parameters for loading virtual environment.- Also contains similar configurations for test data.
raw/: Raw datasets from original source.cleaned/: Cleaned datasets.
- Jupyter notebooks for analyses and code development
- notebooks will be removed after migration to library code.
- Data Dictionaries, references to external sources.
etl.py: Library code that executes tasks useful for getting data.
- Versioned test data.
- Visual outputs from EDA and analyses pipelines.
- Docker image to replicate the environment the project was developed in.
- Python libraries/modules used as well as their corresponding versions.
- Main driver for project replication