- File manipulations
- Change all .txt files from 2 to 3/4 columns
- modify_columns.py
- If the column number if 2 then this script will fix it and over write the file
- Output a
notes.txt
to show what was done to each file - Neet to modify the hard coded directory path before running
- modify_columns.py
- Remove all carriage returns and correct ascii characters from .xml files
- clean_text.py
- Clean the text field in
.xml
- Clean the text field in
- clean_xml_and_newline.py
- Remove carriage returns
- Expand the escaped characters, for example: £ becomes £
- clean_text.py
- Change all .txt files from 2 to 3/4 columns
- Scrape Archivist
- Download all the
.xml
from Instrument- scrape_archivist_selenium.py
- Use selenium to download all
.xml
files - Need to modify the user log in and password
- Use selenium to download all
- scrape_archivist_selenium.py
- Download all the
- Scrape Archivist continuous
- Download all the
.txt
from Datasets
- Download all the
- Setup Heroku
- Record how to insert table from file on wiki
- Fix problems around ncds_81_i.xml:
- could not download: fixed in scrape_archivist_selenium.py
- file contains
&#
instead of&#
: fixed in clean_text.py - Note: need to run clean_text.py first then clean_xml_and_newline.py
- Process NCDS_2004_tables_version5.xlsx
- pre_process_db_input.py
- Output csv files
- pre_process_db_input.py
- Built database using above csv files, see Populate database wiki
- db_temp.sql
- Insert all the ouput csv files from pre_process_db_input.py to temporary tables
- db_insert.sql
- From temporary tables, insert to database tables
- db_delete.sql
- Delete a study
- db_temp.sql
- The Longitudinal Study of Young People in England (LSYPE), also known as "Next Steps"
-
UCL Centre for Longitudinal Studies COVID-19 Online Survey Questionnaire
-
Export and clean xml files
-
Understanding Society Coronavirus Study
-
Code lists could be used for different questions, fix LSYPE studies
- LSYPE_clean_codes.py
- same script used for all studys, need to modify input dir
- LSYPE_clean_codes.py
-
Generation Scotland COVID19 Study
- Export txt files from archivist
-
Gitlab work flow
-
Understanding Society Covid-19 study
- Questionnaires
- [parse_us_covid_xml.py](https://github.com/jli755/python_scripts/blob/master/parse_us_covid_xml.py