layout | title | tagline |
---|---|---|
page |
Prepare and Submit a Job |
Continuing with the previous example of kallisto (see Find an Application), we know there are three required input files, one required input parameter, and two optional input parameters for this specific application.
#### Build a job template file
To run an instance of this application with our data (called a "job"), we first
must assemble a json description of the job we would like to run. The simplest
way to do this is to use the Agave jobs-template
command:
% jobs-template kallisto-0.43.1u3
{
"name": "kallisto test-1506708374",
"appId": "kallisto-0.43.1u31",
"archive": true,
"inputs": {
"transcripts": "read1.fastq",
"fastq1": "",
"fastq2": ""
},
"parameter": {
"output": "output"
}
}
By default, this is output to the screen. To store it in a file instead, perform:
% jobs-template kallisto-0.43.1u3 > kallisto_job.json
We see very basic information including an identifying name for our job, the
identity of the app, and whether it should be archived. Also, we see only the
required inputs and parameters. (All optional inputs and parameters can be
automatically supplied using the -A
flag. See jobs-template -h
for more
details).
#### Add input files to job template
If you staged your own data to your private storage system, now is the time to provide the path to your data. This is easy to do using Agave URIs. For example, see these URIs describing the path to publicly accessible data for this kallisto job:
"inputs": {
"transcripts": "agave://data-sd2e-community/sample/kallisto/test/transcripts.fasta.gz",
"fastq1": "agave://data-sd2e-community/sample/kallisto/test/reads_1.fastq.gz",
"fastq2": "agave://data-sd2e-community/sample/kallisto/test/reads_2.fastq.gz"
},
The prefix for an Agave URI is always agave://
followed by the storage system,
followed by the complete path relative to the system root directory, and finally
the name of the file. Modify your kallisto_job.json
file to point to either this
public data, or your data that you have staged on your private storage system.
#### Add parameters to job template
Initially only the required parameter output
is listed. If we wish to use
non-default values for the unlisted parameters, bootstrap
and seed
, we must
add them to the job template now:
"parameters": {
"output": "output",
"bootstrap": 100,
"seed": 1
}
The json format is very unforgiving with typos. If your job file is not accepted, you may consider running it through an external JSON validator.
#### Submit a job
Once you are satisfied that your data is staged and the job template file contains
the instructions you want to use to run the job, use the jobs-submit
command
to submit the job:
% jobs-submit -F kallisto_job.json
Successfully submitted job 833421020533756391-242ac11b-0001-007
If there are no errors, you will see a success message along with a long unique
identifier (UID) for your job. You can monitor the progress of the job with the jobs-list
command and, optionally, the job UID:
% jobs-list
833421020533756391-242ac11b-0001-007 PENDING
#### Download the results
Once the job status is FINISHED
, you can list what output is available:
% jobs-output-list 833421020533756391-242ac11b-0001-007
.agave.log
kallisto-test-1518033119-833421020533756391-242ac11b-0001-007.err
kallisto-test-1518033119-833421020533756391-242ac11b-0001-007.out
output
reads_1.fastq.gz
reads_2.fastq.gz
transcript.idx
transcripts.fasta.gz
Because this job had attribute "archive":true
in the job template, the output
files were archived in your tacc-work
space under a folder named after your
username. A copy of the output also remains in the jobs API space for easy access
and, if desired, chaining into other jobs. The important output for this
particular job is all located in the output
subdirectory. Download just the
output
subdirectory using the following command:
% jobs-output-get -r 833421020533756391-242ac11b-0001-007 /output/
Downloading output/abundance.h5 ...
######################################################################## 100.0%
Downloading output/abundance.tsv ...
######################################################################## 100.0%
Downloading output/run_info.json ...
######################################################################## 100.0%
Or, you can download all of the job files including output, logs, and other run time files using the following command:
% jobs-output-get -r 833421020533756391-242ac11b-0001-007
Return to the API Documentation Overview