-
Notifications
You must be signed in to change notification settings - Fork 1
GSA documentation home
Welcome to the GSA-pipeline wiki!
Original documentation is here: [GSA wiki on WormBase] (http://wiki.wormbase.org/index.php/Caltech_documentation)
The following cron jobs on textpresso-dev.caltech.edu {WHERE??} automate the GSA pipeline.
Location of scripts are shown and cron details show the order in which they are run.
Step 1: download entities
05 06 * * * cd /home/arunr/gsa/worm/scripts; ./01downloadModEntities.pl 2>/dev/null >/dev/null;
25 15 * * * cd /home/arunr/gsa/worm/scripts; ./01create_elegans_gene_list.pl 2>/dev/null >/dev/null;
27 15 * * * cd /home/arunr/gsa/worm/scripts; ./02create_elegans_variation_list.pl 2>/dev/null >/dev/null;
31 17 * * * cd /home/arunr/gsa/worm/scripts; ./03create_elegans_transgene_list.pl 2>/dev/null >/dev/null;
NOTE: One may have to perform the following manually weekly on textpresso-dev.caltech.edu using the password for citace:
$ cd /home/arunr/gsa/worm/scripts
$ ./01downloadModEntities.pl
Step 2: form sorted lexicon
20 20 * * 2,4,6 cd /home/arunr/gsa/worm/scripts; ./02formSortedLexicon.pl 2>/dev/null >/dev/null;
Step 3: check for new XML files and run if any new files in incoming_xml/
12,27,43,57 * * * * cd /home/arunr/gsa/worm/scripts; ./03link.pl ../incoming_xml/ ../html/ 2>/dev/null >/dev/null;
Step 3.1: check if WB curator wants to re-run the linking script after adding new entities via the journal first pass form.
13,28,43,58 * * * * cd /home/arunr/gsa/worm/scripts; ./06rerunLinking.pl 2>/dev/null >/dev/null;
Step 4: FTP the linked XML file after the curator submits file for FTP
06,21,36,51 * * * * cd /home/arunr/gsa/worm/scripts: ./05ftpAndEmailDjs.pl
Step 1: download entities
00 07 * * 0 cd /home/arunr/gsa/yeast/scripts/; ./01downloadModEntities.pl 2>/dev/null >/dev/null;
Step 2: form sorted lexicon
10 07 * * 0 cd /home/arunr/gsa/yeast/scripts/; ./02formSortedLexicon.pl 2>/dev/null >/dev/null;
Step 3: checks for new XML files and runs if any new files
01,16,31,46 * * * * cd /home/arunr/gsa/yeast/scripts; ./03link.pl ../incoming_xml/ ../html/ 2>/dev/null >/dev/null;
Step 4: FTPs the linked XML file after the curator submits file for FTP
09,29,49 * * * * cd /home/arunr/gsa/yeast/scripts; ./07run04and05.pl 2>/dev/null >/dev/null;
Step 1: download entities
00 08 * * 0 cd /home/arunr/gsa/fly/scripts/; ./01downloadModEntities.pl 2>/dev/null >/dev/null;
Step 2: form sorted lexicon
10 09 * * 0 cd /home/arunr/gsa/fly/scripts/; ./02formSortedLexicon.pl 2>/dev/null >/dev/null;
Step 3: checks for new XML files and runs if any new files
02,17,32,47 * * * * cd /home/arunr/gsa/fly/scripts; ./03link.pl ../incoming_xml/ ../html/ 2>/dev/null >/dev/null;
Step 4: FTPs the linked XML file after the curator submits file for FTP
01,21,41 * * * * cd /home/arunr/gsa/yflyast/scripts; ./07run04and05.pl 2>/dev/null >/dev/null;
-
A file could get caught in the pipeline if problems in the XML itself is the cause -> an email will alert the developer of which line is causing a problem
Action: edit the latest XML file in:
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/incoming_xml
where is fly, worm or yeast. -
DJS needs the paper to be redone
Action: developer will have to clear the pipeline of the paper Delete the corresponding file containing the document number in/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/incoming_xml/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/done/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/logs/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/html/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/entity_link_tables/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/first_pass_logs/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/first_pass_entity_link_tables/
Rerun the script manually by:
$ cd /home/arunr/gsa/<MOD>/scripts/
$ ./03link.pl ../incoming_xml/<docid>.xml ../html
Curators will be resent the newly linked paper with new entity table link -
QC curators need the paper to be redone
Action: developer will have to clear the pipeline of the paper Delete the corresponding file containing the document number in/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa//incoming_xml/DO NOT delete the incoming XML file
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/done/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/logs/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/html/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/entity_link_tables/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/first_pass_logs/
/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/first_pass_entity_link_tables/
Rerun the script manually by:
$ cd /home/arunr/gsa/<MOD>/scripts/
$ ./03link.pl ../incoming_xml/<docid>.xml ../html
Curators will receive alerts as before -
FTP fails for some reason
Action: file will need to be manually FTP'd to DJS, ftp1.dartmouthjournals.com./data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/linked_xml/<docid>.XML
is the document ID and is fly, worm or yeast. Ask curator for username and password. -
Changes in developers, GSA editors, DJS personnel
Action: Emails need to be added, or removed. The files with the email addresses are located in:/data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/emails/