Skip to content

GSA documentation home

Karen Yook edited this page Mar 1, 2016 · 3 revisions

Welcome to the GSA-pipeline wiki!

Pipeline scripts

Original documentation is here: [GSA wiki on WormBase] (http://wiki.wormbase.org/index.php/Caltech_documentation)


The following cron jobs on textpresso-dev.caltech.edu {WHERE??} automate the GSA pipeline.
Location of scripts are shown and cron details show the order in which they are run.

WB Scripts

Step 1: download entities
05 06 * * * cd /home/arunr/gsa/worm/scripts; ./01downloadModEntities.pl 2>/dev/null >/dev/null;
25 15 * * * cd /home/arunr/gsa/worm/scripts; ./01create_elegans_gene_list.pl 2>/dev/null >/dev/null;
27 15 * * * cd /home/arunr/gsa/worm/scripts; ./02create_elegans_variation_list.pl 2>/dev/null >/dev/null;
31 17 * * * cd /home/arunr/gsa/worm/scripts; ./03create_elegans_transgene_list.pl 2>/dev/null >/dev/null;
NOTE: One may have to perform the following manually weekly on textpresso-dev.caltech.edu using the password for citace:
$ cd /home/arunr/gsa/worm/scripts
$ ./01downloadModEntities.pl

Step 2: form sorted lexicon
20 20 * * 2,4,6 cd /home/arunr/gsa/worm/scripts; ./02formSortedLexicon.pl 2>/dev/null >/dev/null;
Step 3: check for new XML files and run if any new files in incoming_xml/ 12,27,43,57 * * * * cd /home/arunr/gsa/worm/scripts; ./03link.pl ../incoming_xml/ ../html/ 2>/dev/null >/dev/null;
Step 3.1: check if WB curator wants to re-run the linking script after adding new entities via the journal first pass form.
13,28,43,58 * * * * cd /home/arunr/gsa/worm/scripts; ./06rerunLinking.pl 2>/dev/null >/dev/null;
Step 4: FTP the linked XML file after the curator submits file for FTP
06,21,36,51 * * * * cd /home/arunr/gsa/worm/scripts: ./05ftpAndEmailDjs.pl

SGD Scripts

Step 1: download entities
00 07 * * 0 cd /home/arunr/gsa/yeast/scripts/; ./01downloadModEntities.pl 2>/dev/null >/dev/null;
Step 2 - form sorted lexicon
10 07 * * 0 cd /home/arunr/gsa/yeast/scripts/; ./02formSortedLexicon.pl 2>/dev/null >/dev/null;
Step 3: checks for new XML files and runs if any new files
01,16,31,46 * * * * cd /home/arunr/gsa/yeast/scripts; ./03link.pl ../incoming_xml/ ../html/ 2>/dev/null >/dev/null;
Step 4: FTPs the linked XML file after the curator submits file for FTP
09,29,49 * * * * cd /home/arunr/gsa/yeast/scripts; ./07run04and05.pl 2>/dev/null >/dev/null;

Flybase Scripts

Step 1: download entities
00 08 * * 0 cd /home/arunr/gsa/fly/scripts/; ./01downloadModEntities.pl 2>/dev/null >/dev/null;
Step 2: form sorted lexicon
10 09 * * 0 cd /home/arunr/gsa/fly/scripts/; ./02formSortedLexicon.pl 2>/dev/null >/dev/null;
Step 3: checks for new XML files and runs if any new files
02,17,32,47 * * * * cd /home/arunr/gsa/fly/scripts; ./03link.pl ../incoming_xml/ ../html/ 2>/dev/null >/dev/null;
Step 4: FTPs the linked XML file after the curator submits file for FTP
01,21,41 * * * * cd /home/arunr/gsa/yflyast/scripts; ./07run04and05.pl 2>/dev/null >/dev/null;

Pipeline Problems/Solutions

  1. A file could get caught in the pipeline.
    A. If problems in the XML itself is the cause -> an email will alert the developer of which line is causing a problem
    Solution: edit the latest XML file in:
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/incoming_xml where is fly, worm or yeast.

  2. DJS needs the paper to be redone
    Developer will have to clear the pipeline of the paper Delete the corresponding file containing the document number in /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/incoming_xml/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/done/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/logs/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/html/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/entity_link_tables/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/first_pass_logs/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/first_pass_entity_link_tables/
    Rerun the script manually by:
    $ cd /home/arunr/gsa/<MOD>/scripts/
    $ ./03link.pl ../incoming_xml/<docid>.xml ../html
    Curators will be resent the newly linked paper with new entity table link

  3. QC curators need the paper to be redone
    Developer will have to clear the pipeline of the paper Delete the corresponding file containing the document number in /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa//incoming_xml/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/done/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/logs/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/html/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/entity_link_tables/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/first_pass_logs/
    /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/first_pass_entity_link_tables/
    Rerun the script manually by:
    $ cd /home/arunr/gsa/<MOD>/scripts/
    $ ./03link.pl ../incoming_xml/<docid>.xml ../html
    Curators will be resent the newly linked paper with new entity table link

  4. FTP fails for some reason
    File will need to be manually FTP'd to DJS, ftp1.dartmouthjournals.com. /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/linked_xml/<docid>.XML
    is the document ID and is fly, worm or yeast. Ask curator for username and password.

  5. Changes in developers, GSA editors, DJS personnel
    Emails need to be added, or removed. The files with the email addresses are located in: /data2/srv/textpresso-dev.caltech.edu/www/docroot/gsa/<MOD>/emails/

Clone this wiki locally