Skip to content

jli755/python_scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

October 2019

Week 1 (Oct 1 - Oct 4)

  1. File manipulations
    1. Change all .txt files from 2 to 3/4 columns
      1. modify_columns.py
        • If the column number if 2 then this script will fix it and over write the file
        • Output a notes.txt to show what was done to each file
        • Neet to modify the hard coded directory path before running
    2. Remove all carriage returns and correct ascii characters from .xml files
      1. clean_text.py
        • Clean the text field in .xml
      2. clean_xml_and_newline.py
        • Remove carriage returns
        • Expand the escaped characters, for example: £ becomes £
  2. Scrape Archivist
    1. Download all the .xml from Instrument
      1. scrape_archivist_selenium.py
        • Use selenium to download all .xml files
        • Need to modify the user log in and password

Week 2 (Oct 7 - Oct 11)

  1. Scrape Archivist continuous
    1. Download all the .txt from Datasets
  2. Setup Heroku
    1. Record how to insert table from file on wiki
  3. Fix problems around ncds_81_i.xml:
    1. could not download: fixed in scrape_archivist_selenium.py
    2. file contains &# instead of &#: fixed in clean_text.py
    3. Note: need to run clean_text.py first then clean_xml_and_newline.py

November 2019

  1. Process NCDS_2004_tables_version5.xlsx
    1. pre_process_db_input.py
      • Output csv files
  2. Built database using above csv files, see Populate database wiki
    1. db_temp.sql
    2. db_insert.sql
      • From temporary tables, insert to database tables
    3. db_delete.sql
      • Delete a study

Dec 2019 - Apr 2020

  1. The Longitudinal Study of Young People in England (LSYPE), also known as "Next Steps"
    1. Wave 8
    2. Wave 7
    3. Wave 5
    4. Wave 4
    5. Wave 3
    6. Wave 2
    7. Wave 1

May 2020

  1. UCL Centre for Longitudinal Studies COVID-19 Online Survey Questionnaire

    1. Wave 1
  2. Export and clean xml files

    1. archivist_click_export_button.py
    2. export_clean_xml.py
  3. Understanding Society Coronavirus Study

    1. April 2020 questionnaire
  4. Code lists could be used for different questions, fix LSYPE studies

    1. LSYPE_clean_codes.py
      • same script used for all studys, need to modify input dir

June 2020

  1. Generation Scotland COVID19 Study

    1. Wave 1
  2. Understanding Society

    1. wiki page

July 2020

  1. Export txt files from archivist
    1. export_archivist_txt.py

August 2020

  1. Gitlab work flow

    1. Insert tables to archivist database
    2. Export xml/text files from archivist
  2. Understanding Society Covid-19 study

    1. Questionnaires
    2. [parse_us_covid_xml.py](https://github.com/jli755/python_scripts/blob/master/parse_us_covid_xml.py

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages