Skip to content

Commit 1ed3ed9

Browse files
committed
Enhance cwltoil to support SoftwareRequirements & BioContainers.
This enables the reproducibilty stack described in [this preprint](https://www.biorxiv.org/content/early/2017/10/11/200683) and [presented at BOSC 2017](http://jmchilton.github.io/writing/bosc2017slides/biocontainers.html) under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory using this branch of Toil. ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of CWL ``SoftwareRequirement`` hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicit ``DockerRequirement`` but does define the following ``SoftwareRequirement`` in its ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with ``cwltoil`` and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct ``cwltoil`` to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional ``SoftwareRequirement`` resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in [galaxy-lib's documentation](http://galaxy-lib.readthedocs.io/en/latest/topics/dependency_resolution.html). In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs ``cwltoil`` to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
1 parent 491e3cc commit 1ed3ed9

File tree

7 files changed

+129
-18
lines changed

7 files changed

+129
-18
lines changed

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ def runSetup():
5454
'cwl': [
5555
'cwltool==1.0.20170822192924',
5656
'schema-salad >= 2.6, < 3',
57+
'galaxy-lib==17.9.3',
5758
'cwltest>=1.0.20170214185319']},
5859
package_dir={'': 'src'},
5960
packages=find_packages(where='src',

src/toil/cwl/cwltoil.py

Lines changed: 39 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
import cwltool.draft2tool
4141
from cwltool.pathmapper import PathMapper, adjustDirObjs, adjustFileObjs, get_listing, MapperEnt, visit_class, normalizeFilesDirs
4242
from cwltool.process import shortname, fillInDefaults, compute_checksums, collectFilesAndDirs, stageFiles
43+
from cwltool.software_requirements import DependenciesConfiguration, get_container_from_software_requirements
4344
from cwltool.utils import aslist
4445
import schema_salad.validate as validate
4546
import schema_salad.ref_resolver
@@ -831,6 +832,14 @@ def main(args=None, stdout=sys.stdout):
831832
metavar=("VAR1 VAR2"),
832833
default=("PATH",),
833834
dest="preserve_environment")
835+
# help="Dependency resolver configuration file describing how to adapt 'SoftwareRequirement' packages to current system."
836+
parser.add_argument("--beta-dependency-resolvers-configuration", default=None)
837+
# help="Defaut root directory used by dependency resolvers configuration."
838+
parser.add_argument("--beta-dependencies-directory", default=None)
839+
# help="Use biocontainers for tools without an explicitly annotated Docker container."
840+
parser.add_argument("--beta-use-biocontainers", default=None, action="store_true")
841+
# help="Short cut to use Conda to resolve 'SoftwareRequirement' packages."
842+
parser.add_argument("--beta-conda-dependencies", default=None, action="store_true")
834843

835844
# mkdtemp actually creates the directory, but
836845
# toil requires that the directory not exist,
@@ -853,22 +862,32 @@ def main(args=None, stdout=sys.stdout):
853862
outdir = os.path.abspath(options.outdir)
854863
fileindex = {}
855864
existing = {}
865+
make_tool_kwargs = {}
866+
conf_file = getattr(options, "beta_dependency_resolvers_configuration", None) # Text
867+
use_conda_dependencies = getattr(options, "beta_conda_dependencies", None) # Text
868+
job_script_provider = None
869+
if conf_file or use_conda_dependencies:
870+
dependencies_configuration = DependenciesConfiguration(options) # type: DependenciesConfiguration
871+
job_script_provider = dependencies_configuration
872+
873+
options.default_container = None
874+
make_tool_kwargs["find_default_container"] = functools.partial(find_default_container, options)
856875

857876
with Toil(options) as toil:
858877
if options.restart:
859878
outobj = toil.restart()
860879
else:
861880
useStrict = not options.not_strict
881+
make_tool_kwargs["hints"] = [{
882+
"class": "ResourceRequirement",
883+
"coresMin": toil.config.defaultCores,
884+
"ramMin": toil.config.defaultMemory / (2**20),
885+
"outdirMin": toil.config.defaultDisk / (2**20),
886+
"tmpdirMin": 0
887+
}]
862888
try:
863889
t = cwltool.load_tool.load_tool(options.cwltool, toilMakeTool,
864-
kwargs={
865-
"hints": [{
866-
"class": "ResourceRequirement",
867-
"coresMin": toil.config.defaultCores,
868-
"ramMin": toil.config.defaultMemory / (2**20),
869-
"outdirMin": toil.config.defaultDisk / (2**20),
870-
"tmpdirMin": 0
871-
}]},
890+
kwargs=make_tool_kwargs,
872891
resolver=cwltool.resolver.tool_resolver,
873892
strict=useStrict)
874893
unsupportedRequirementsCheck(t.requirements)
@@ -931,7 +950,8 @@ def setSecondary(fileobj):
931950
try:
932951
(wf1, wf2) = makeJob(t, {}, use_container=use_container,
933952
preserve_environment=options.preserve_environment,
934-
tmpdir=os.path.realpath(outdir), workdir=options.workDir)
953+
tmpdir=os.path.realpath(outdir), workdir=options.workDir,
954+
job_script_provider=job_script_provider)
935955
except cwltool.process.UnsupportedRequirement as e:
936956
logging.error(e)
937957
return 33
@@ -948,3 +968,13 @@ def setSecondary(fileobj):
948968
stdout.write(json.dumps(outobj, indent=4))
949969

950970
return 0
971+
972+
973+
def find_default_container(args, builder):
974+
default_container = None
975+
if args.default_container:
976+
default_container = args.default_container
977+
elif args.beta_use_biocontainers:
978+
default_container = get_container_from_software_requirements(args, builder)
979+
980+
return default_container

src/toil/test/cwl/2.fasta

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
>Sequence 561 BP; 135 A; 106 C; 98 G; 222 T; 0 other;
2+
gttcgatgcc taaaatacct tcttttgtcc ctacacagac cacagttttc ctaatggctt
3+
tacaccgact agaaattctt gtgcaagcac taattgaaag cggttggcct agagtgttac
4+
cggtttgtat agctgagcgc gtctcttgcc ctgatcaaag gttcattttc tctactttgg
5+
aagacgttgt ggaagaatac aacaagtacg agtctctccc ccctggtttg ctgattactg
6+
gatacagttg taataccctt cgcaacaccg cgtaactatc tatatgaatt attttccctt
7+
tattatatgt agtaggttcg tctttaatct tcctttagca agtcttttac tgttttcgac
8+
ctcaatgttc atgttcttag gttgttttgg ataatatgcg gtcagtttaa tcttcgttgt
9+
ttcttcttaa aatatttatt catggtttaa tttttggttt gtacttgttc aggggccagt
10+
tcattattta ctctgtttgt atacagcagt tcttttattt ttagtatgat tttaatttaa
11+
aacaattcta atggtcaaaa a

src/toil/test/cwl/2.fastq

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
@EAS54_6_R1_2_1_413_324
2+
CCCTTCTTGTCTTCAGCGTTTCTCC
3+
+
4+
;;3;;;;;;;;;;;;7;;;;;;;88
5+
@EAS54_6_R1_2_1_540_792
6+
TTGGCAGGCCAAGGCCGATGGATCA
7+
+
8+
;;;;;;;;;;;7;;;;;-;;;3;83
9+
@EAS54_6_R1_2_1_443_348
10+
GTTGCTTCTGGCGTGGGTGGGGGGG
11+
+EAS54_6_R1_2_1_443_348
12+
;;;;;;;;;;;9;7;;.7;393333

src/toil/test/cwl/cwlTest.py

Lines changed: 36 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -27,22 +27,20 @@
2727

2828
from toil.test import ToilTest, needs_cwl, slow
2929

30-
3130
@needs_cwl
3231
class CWLTest(ToilTest):
3332

34-
def _tester(self, cwlfile, jobfile, outDir, expect):
33+
def _tester(self, cwlfile, jobfile, outDir, expect, main_args=[], out_name="output"):
3534
from toil.cwl import cwltoil
3635
rootDir = self._projectRootPath()
3736
st = StringIO()
38-
cwltoil.main(['--outdir', outDir,
39-
os.path.join(rootDir, cwlfile),
40-
os.path.join(rootDir, jobfile)],
41-
stdout=st)
37+
main_args = main_args[:]
38+
main_args.extend(['--outdir', outDir, os.path.join(rootDir, cwlfile), os.path.join(rootDir, jobfile)])
39+
cwltoil.main(main_args, stdout=st)
4240
out = json.loads(st.getvalue())
43-
out["output"].pop("http://commonwl.org/cwltool#generation", None)
44-
out["output"].pop("nameext", None)
45-
out["output"].pop("nameroot", None)
41+
out[out_name].pop("http://commonwl.org/cwltool#generation", None)
42+
out[out_name].pop("nameext", None)
43+
out[out_name].pop("nameroot", None)
4644
self.assertEquals(out, expect)
4745

4846
def _debug_worker_tester(self, cwlfile, jobfile, outDir, expect):
@@ -148,3 +146,32 @@ def test_run_conformance(self):
148146
if not only_unsupported:
149147
print(e.output)
150148
raise e
149+
150+
@slow
151+
def test_bioconda(self):
152+
outDir = self._createTempDir()
153+
self._tester('src/toil/test/cwl/seqtk_seq.cwl',
154+
'src/toil/test/cwl/seqtk_seq_job.json',
155+
outDir,
156+
self._expected_seqtk_output(outDir),
157+
main_args=["--beta-conda-dependencies"],
158+
out_name="output1")
159+
160+
def test_biocontainers(self):
161+
outDir = self._createTempDir()
162+
self._tester('src/toil/test/cwl/seqtk_seq.cwl',
163+
'src/toil/test/cwl/seqtk_seq_job.json',
164+
outDir,
165+
self._expected_seqtk_output(outDir),
166+
main_args=["--beta-use-biocontainers"],
167+
out_name="output1")
168+
169+
def _expected_seqtk_output(self, outDir):
170+
return {
171+
u"output": {
172+
u"location": "file://" + str(os.path.join(outDir, 'output.txt')),
173+
u"checksum": u"sha1$322e001e5a99f19abdce9f02ad0f02a17b5066c2",
174+
u"basename": str("out"),
175+
u"class": u"File",
176+
u"size": 150}
177+
}

src/toil/test/cwl/seqtk_seq.cwl

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
cwlVersion: v1.0
2+
class: CommandLineTool
3+
id: "seqtk_seq"
4+
doc: "Convert to FASTA (seqtk)"
5+
inputs:
6+
- id: input1
7+
type: File
8+
inputBinding:
9+
position: 1
10+
prefix: "-a"
11+
outputs:
12+
- id: output1
13+
type: File
14+
outputBinding:
15+
glob: out
16+
baseCommand: ["seqtk", "seq"]
17+
arguments: []
18+
stdout: out
19+
hints:
20+
SoftwareRequirement:
21+
packages:
22+
- package: seqtk
23+
version:
24+
- r93

src/toil/test/cwl/seqtk_seq_job.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"input1": {
3+
"class": "File",
4+
"location": "2.fastq"
5+
}
6+
}

0 commit comments

Comments
 (0)