Skip to content
This repository was archived by the owner on Mar 29, 2022. It is now read-only.

Latest commit

 

History

History
143 lines (123 loc) · 4.66 KB

02.submit_job.md

File metadata and controls

143 lines (123 loc) · 4.66 KB
layout title tagline
page
Prepare and Submit a Job

Continuing with the previous example of kallisto (see Find an Application), we know there are three required input files, one required input parameter, and two optional input parameters for this specific application.


#### Build a job template file

To run an instance of this application with our data (called a "job"), we first must assemble a json description of the job we would like to run. The simplest way to do this is to use the Agave jobs-template command:

% jobs-template kallisto-0.43.1u3

{
  "name": "kallisto test-1506708374",
  "appId": "kallisto-0.43.1u31",
  "archive": true,
  "inputs": {
    "transcripts": "read1.fastq",
    "fastq1": "",
    "fastq2": ""
  },
  "parameter": {
    "output": "output"
  }
}

By default, this is output to the screen. To store it in a file instead, perform:

% jobs-template kallisto-0.43.1u3 > kallisto_job.json

We see very basic information including an identifying name for our job, the identity of the app, and whether it should be archived. Also, we see only the required inputs and parameters. (All optional inputs and parameters can be automatically supplied using the -A flag. See jobs-template -h for more details).


#### Add input files to job template

If you staged your own data to your private storage system, now is the time to provide the path to your data. This is easy to do using Agave URIs. For example, see these URIs describing the path to publicly accessible data for this kallisto job:

 "inputs": {
   "transcripts": "agave://data-sd2e-community/sample/kallisto/test/transcripts.fasta.gz",
   "fastq1": "agave://data-sd2e-community/sample/kallisto/test/reads_1.fastq.gz",
   "fastq2": "agave://data-sd2e-community/sample/kallisto/test/reads_2.fastq.gz"
 },

The prefix for an Agave URI is always agave:// followed by the storage system, followed by the complete path relative to the system root directory, and finally the name of the file. Modify your kallisto_job.json file to point to either this public data, or your data that you have staged on your private storage system.


#### Add parameters to job template

Initially only the required parameter output is listed. If we wish to use non-default values for the unlisted parameters, bootstrap and seed, we must add them to the job template now:

 "parameters": {
   "output": "output",
   "bootstrap": 100,
   "seed": 1
 }

The json format is very unforgiving with typos. If your job file is not accepted, you may consider running it through an external JSON validator.


#### Submit a job

Once you are satisfied that your data is staged and the job template file contains the instructions you want to use to run the job, use the jobs-submit command to submit the job:

% jobs-submit -F kallisto_job.json
Successfully submitted job 833421020533756391-242ac11b-0001-007

If there are no errors, you will see a success message along with a long unique identifier (UID) for your job. You can monitor the progress of the job with the jobs-list command and, optionally, the job UID:

% jobs-list
833421020533756391-242ac11b-0001-007 PENDING

#### Download the results

Once the job status is FINISHED, you can list what output is available:

% jobs-output-list 833421020533756391-242ac11b-0001-007
.agave.log
kallisto-test-1518033119-833421020533756391-242ac11b-0001-007.err
kallisto-test-1518033119-833421020533756391-242ac11b-0001-007.out
output
reads_1.fastq.gz
reads_2.fastq.gz
transcript.idx
transcripts.fasta.gz

Because this job had attribute "archive":true in the job template, the output files were archived in your tacc-work space under a folder named after your username. A copy of the output also remains in the jobs API space for easy access and, if desired, chaining into other jobs. The important output for this particular job is all located in the output subdirectory. Download just the output subdirectory using the following command:

% jobs-output-get -r 833421020533756391-242ac11b-0001-007 /output/
Downloading output/abundance.h5 ...
######################################################################## 100.0%
Downloading output/abundance.tsv ...
######################################################################## 100.0%
Downloading output/run_info.json ...
######################################################################## 100.0%

Or, you can download all of the job files including output, logs, and other run time files using the following command:

% jobs-output-get -r 833421020533756391-242ac11b-0001-007

Return to the API Documentation Overview