Skip to content

HayleyMills/python_scripts

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

October 2019

Week 1 (Oct 1 - Oct 4)

  1. File manipulations
    1. Change all .txt files from 2 to 3/4 columns
      1. modify_columns.py
        • If the column number if 2 then this script will fix it and over write the file
        • Output a notes.txt to show what was done to each file
        • Neet to modify the hard coded directory path before running
    2. Remove all carriage returns and correct ascii characters from .xml files
      1. clean_text.py
        • Clean the text field in .xml
      2. clean_xml_and_newline.py
        • Remove carriage returns
        • Expand the escaped characters, for example: £ becomes £
  2. Scrape Archivist
    1. Download all the .xml from Instrument
      1. scrape_archivist_selenium.py
        • Use selenium to download all .xml files
        • Need to modify the user log in and password

Week 2 (Oct 7 - Oct 11)

  1. Scrape Archivist continuous
    1. Download all the .txt from Datasets
  2. Setup Heroku
    1. Record how to insert table from file on wiki
  3. Fix problems around ncds_81_i.xml:
    1. could not download: fixed in scrape_archivist_selenium.py
    2. file contains &# instead of &#: fixed in clean_text.py
    3. Note: need to run clean_text.py first then clean_xml_and_newline.py

November 2019

  1. Process NCDS_2004_tables_version5.xlsx
    1. pre_process_db_input.py
      • Output csv files
  2. Built database using above csv files, see Populate database wiki
    1. db_temp.sql
    2. db_insert.sql
      • From temporary tables, insert to database tables
    3. db_delete.sql
      • Delete a study

Dec 2019 - Apr 2020

  1. The Longitudinal Study of Young People in England (LSYPE), also known as "Next Steps"
    1. Wave 8
    2. Wave 7
    3. Wave 5
    4. Wave 4
    5. Wave 3
    6. Wave 2
    7. Wave 1

May 2020

  1. UCL Centre for Longitudinal Studies COVID-19 Online Survey Questionnaire

    1. Wave 1
  2. Export and clean xml files

    1. archivist_click_export_button.py
    2. export_clean_xml.py
  3. Understanding Society Coronavirus Study

    1. April 2020 questionnaire
  4. Code lists could be used for different questions, fix LSYPE studys

    1. LSYPE_clean_codes.py
      • same script used for all studys, need to modify input dir

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.1%
  • TSQL 3.9%