Skip to content

Commit 9900de0

Browse files
authored
Merge pull request #214 from jmchilton/galaxy_deps
Beta support for configurable dependency resolution & Biocontainers.
2 parents 6e01be1 + fd2ac01 commit 9900de0

File tree

150 files changed

+4094
-14
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

150 files changed

+4094
-14
lines changed

README.rst

+206
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,212 @@ The easiest way to use cwltool to run a tool or workflow from Python is to use a
139139

140140
# result["out"] == "foo"
141141

142+
Leveraging SoftwareRequirements (Beta)
143+
--------------------------------------
144+
145+
CWL tools may be decoarated with ``SoftwareRequirement`` hints that cwltool
146+
may in turn use to resolve to packages in various package managers or
147+
dependency management systems such as `Environment Modules
148+
<http://modules.sourceforge.net/>`__.
149+
150+
Utilizing ``SoftwareRequirement`` hints using cwltool requires an optional
151+
dependency, for this reason be sure to use specify the ``deps`` modifier when
152+
installing cwltool. For instance::
153+
154+
$ pip install 'cwltool[deps]'
155+
156+
Installing cwltool in this fashion enables several new command line options.
157+
The most general of these options is ``--beta-dependency-resolvers-configuration``.
158+
This option allows one to specify a dependency resolvers configuration file.
159+
This file may be specified as either XML or YAML and very simply describes various
160+
plugins to enable to "resolve" ``SoftwareRequirement`` dependencies.
161+
162+
To discuss some of these plugins and how to configure them, first consider the
163+
following ``hint`` definition for an example CWL tool.
164+
165+
.. code:: yaml
166+
167+
SoftwareRequirement:
168+
packages:
169+
- package: seqtk
170+
version:
171+
- r93
172+
173+
Now imagine deploying cwltool on a cluster with Software Modules installed
174+
and that a ``seqtk`` module is avaialble at version ``r93``. This means cluster
175+
users likely won't have the ``seqtk`` the binary on their ``PATH`` by default but after
176+
sourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is
177+
available on the ``PATH``. A simple dependency resolvers configuration file, called
178+
``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source
179+
the correct module environment before executing the above tool would simply be:
180+
181+
.. code:: yaml
182+
183+
- type: module
184+
185+
The outer list indicates that one plugin is being enabled, the plugin parameters are
186+
defined as a dictionary for this one list item. There is only one required parameter
187+
for the plugin above, this is ``type`` and defines the plugin type. This parameter
188+
is required for all plugins. The available plugins and the parameters
189+
available for each are documented (incompletely) `here
190+
<https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__.
191+
Unfortunately, this documentation is in the context of Galaxy tool ``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly.
192+
193+
cwltool is distributed with an example of such seqtk tool and sample corresponding
194+
job. It could executed from the cwltool root using a dependency resolvers
195+
configuration file such as the above one using the command::
196+
197+
cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \
198+
tests/seqtk_seq.cwl \
199+
tests/seqtk_seq_job.json
200+
201+
This example demonstrates both that cwltool can leverage
202+
existing software installations and also handle workflows with dependencies
203+
on different versions of the same software and libraries. However the above
204+
example does require an existing module setup so it is impossible to test this example
205+
"out of the box" with cwltool. For a more isolated test that demonstrates all
206+
the same concepts - the resolver plugin type ``galaxy_packages`` can be used.
207+
208+
"Galaxy packages" are a lighter weight alternative to Environment Modules that are
209+
really just defined by a way to lay out directories into packages and versions
210+
to find little scripts that are sourced to modify the environment. They have
211+
been used for years in Galaxy community to adapt Galaxy tools to cluster
212+
environments but require neither knowledge of Galaxy nor any special tools to
213+
setup. These should work just fine for CWL tools.
214+
215+
The cwltool source code repository's test directory is setup with a very simple
216+
directory that defines a set of "Galaxy packages" (but really just defines one
217+
package named ``random-lines``). The directory layout is simply::
218+
219+
tests/test_deps_env/
220+
random-lines/
221+
1.0/
222+
env.sh
223+
224+
If the ``galaxy_packages`` plugin is enabled and pointed at the
225+
``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement``
226+
such as the following is encountered.
227+
228+
.. code:: yaml
229+
230+
hints:
231+
SoftwareRequirement:
232+
packages:
233+
- package: 'random-lines'
234+
version:
235+
- '1.0'
236+
237+
Then cwltool will simply find that ``env.sh`` file and source it before executing
238+
the corresponding tool. That ``env.sh`` script is only responsible for modifying
239+
the job's ``PATH`` to add the required binaries.
240+
241+
This is a full example that works since resolving "Galaxy packages" has no
242+
external requirements. Try it out by executing the following command from cwltool's
243+
root directory::
244+
245+
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \
246+
tests/random_lines.cwl \
247+
tests/random_lines_job.json
248+
249+
The resolvers configuration file in the above example was simply:
250+
251+
.. code:: yaml
252+
253+
- type: galaxy_packages
254+
base_path: ./tests/test_deps_env
255+
256+
It is possible that the ``SoftwareRequirement`` s in a given CWL tool will not
257+
match the module names for a given cluster. Such requirements can be re-mapped
258+
to specific deployed packages and/or versions using another file specified using
259+
the resolver plugin parameter `mapping_files`. We will
260+
demonstrate this using `galaxy_packages` but the concepts apply equally well
261+
to Environment Modules or Conda packages (described below) for instance.
262+
263+
So consider the resolvers configuration file
264+
(`tests/test_deps_env_resolvers_conf_rewrite.yml`):
265+
266+
.. code:: yaml
267+
268+
- type: galaxy_packages
269+
base_path: ./tests/test_deps_env
270+
mapping_files: ./tests/test_deps_mapping.yml
271+
272+
And the corresponding mapping configuraiton file (`tests/test_deps_mapping.yml`):
273+
274+
.. code:: yaml
275+
276+
- from:
277+
name: randomLines
278+
version: 1.0.0-rc1
279+
to:
280+
name: random-lines
281+
version: '1.0'
282+
283+
This is saying if cwltool encounters a requirement of ``randomLines`` at version
284+
``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at
285+
version ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl``
286+
that contains such a source ``SoftwareRequirement``. To try out this example with
287+
mapping, execute the following command from the cwltool root directory::
288+
289+
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \
290+
tests/random_lines_mapping.cwl \
291+
tests/random_lines_job.json
292+
293+
The previous examples demonstrated leveraging existing infrastructure to
294+
provide requirements for CWL tools. If instead a real package manager is used
295+
cwltool has the oppertunity to install requirements as needed. While initial
296+
support for Homebrew/Linuxbrew plugins is available, the most developed such
297+
plugin is for the `Conda <https://conda.io/docs/#>`__ package manager. Conda has the nice properties
298+
of allowing multiple versions of a package to be installed simultaneously,
299+
not requiring evalated permissions to install Conda itself or packages using
300+
Conda, and being cross platform. For these reasons, cwltool may run as a normal
301+
user, install its own Conda environment and manage multiple versions of Conda packages
302+
on both Linux and Mac OS X.
303+
304+
The Conda plugin can be endlessly configured, but a sensible set of defaults
305+
that has proven a powerful stack for dependency management within the Galaxy tool
306+
development ecosystem can be enabled by simply passing cwltool the
307+
``--beta-conda-dependencies`` flag.
308+
309+
With this we can use the seqtk example above without Docker and without
310+
any externally managed services - cwltool should install everything it needs
311+
and create an environment for the tool. Try it out with the follwing command::
312+
313+
cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
314+
315+
The CWL specification allows URIs to be attached to ``SoftwareRequirement`` s
316+
that allow disambiguation of package names. If the mapping files described above
317+
allow deployers to adapt tools to their infrastructure, this mechanism allows
318+
tools to adapt their requirements to multiple package managers. To demonstrate
319+
this within the context of the seqtk, we can simply break the package name we
320+
use and then specify a specific Conda package as follows:
321+
322+
.. code:: yaml
323+
324+
hints:
325+
SoftwareRequirement:
326+
packages:
327+
- package: seqtk_seq
328+
version:
329+
- '1.2'
330+
specs:
331+
- https://anaconda.org/bioconda/seqtk
332+
- https://packages.debian.org/sid/seqtk
333+
334+
The example can be executed using the command::
335+
336+
cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json
337+
338+
The plugin framework for managing resolution of these software requirements
339+
as maintained as part of `galaxy-lib <https://github.com/galaxyproject/galaxy-lib>`__ - a small, portable subset of the Galaxy
340+
project. More information on configuration and implementation can be found
341+
at the following links:
342+
343+
- `Dependency Resolvers in Galaxy <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__
344+
- `Conda for [Galaxy] Tool Dependencies <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html>`__
345+
- `Mapping Files - Implementation <https://github.com/galaxyproject/galaxy/commit/495802d229967771df5b64a2f79b88a0eaf00edb>`__
346+
- `Specifications - Implementation <https://github.com/galaxyproject/galaxy/commit/81d71d2e740ee07754785306e4448f8425f890bc>`__
347+
- `Initial cwltool Integration Pull Request <https://github.com/common-workflow-language/cwltool/pull/214>`__
142348

143349
Cwltool control flow
144350
--------------------

cwltool/builder.py

+2
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ def __init__(self): # type: () -> None
5050
# Will be default "no_listing" for CWL v1.1
5151
self.loadListing = "deep_listing" # type: Union[None, str]
5252

53+
self.find_default_container = None # type: Callable[[], Text]
54+
5355
def bind_input(self, schema, datum, lead_pos=None, tail_pos=None):
5456
# type: (Dict[Text, Any], Any, Union[int, List[int]], List[int]) -> List[Dict[Text, Any]]
5557
if tail_pos is None:

cwltool/draft2tool.py

+10
Original file line numberDiff line numberDiff line change
@@ -174,9 +174,19 @@ class CommandLineTool(Process):
174174
def __init__(self, toolpath_object, **kwargs):
175175
# type: (Dict[Text, Any], **Any) -> None
176176
super(CommandLineTool, self).__init__(toolpath_object, **kwargs)
177+
self.find_default_container = kwargs.get("find_default_container", None)
177178

178179
def makeJobRunner(self, use_container=True): # type: (Optional[bool]) -> JobBase
179180
dockerReq, _ = self.get_requirement("DockerRequirement")
181+
if not dockerReq and use_container:
182+
default_container = self.find_default_container(self)
183+
if default_container:
184+
self.requirements.insert(0, {
185+
"class": "DockerRequirement",
186+
"dockerPull": default_container
187+
})
188+
dockerReq = self.requirements[0]
189+
180190
if dockerReq and use_container:
181191
return DockerCommandLineJob()
182192
else:

cwltool/job.py

+11-6
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333

3434
PYTHON_RUN_SCRIPT = """
3535
import json
36+
import os
3637
import sys
3738
import subprocess
3839
@@ -41,6 +42,7 @@
4142
commands = popen_description["commands"]
4243
cwd = popen_description["cwd"]
4344
env = popen_description["env"]
45+
env["PATH"] = os.environ.get("PATH")
4446
stdin_path = popen_description["stdin_path"]
4547
stdout_path = popen_description["stdout_path"]
4648
stderr_path = popen_description["stderr_path"]
@@ -67,7 +69,7 @@
6769
if sp.stdin:
6870
sp.stdin.close()
6971
rcode = sp.wait()
70-
if isinstance(stdin, file):
72+
if stdin is not subprocess.PIPE:
7173
stdin.close()
7274
if stdout is not sys.stderr:
7375
stdout.close()
@@ -145,7 +147,6 @@ def _setup(self): # type: () -> None
145147
_logger.debug(u"[job %s] initial work dir %s", self.name,
146148
json.dumps({p: self.generatemapper.mapper(p) for p in self.generatemapper.files()}, indent=4))
147149

148-
149150
def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):
150151
# type: (List[Text], MutableMapping[Text, Text], bool, Text) -> None
151152

@@ -328,8 +329,12 @@ def run(self, pull_image=True, rm_container=True,
328329
env = cast(MutableMapping[Text, Text], os.environ)
329330
if docker_req and kwargs.get("use_container") is not False:
330331
img_id = docker.get_from_requirements(docker_req, True, pull_image)
331-
elif kwargs.get("default_container", None) is not None:
332-
img_id = kwargs.get("default_container")
332+
if img_id is None:
333+
find_default_container = self.builder.find_default_container
334+
default_container = find_default_container and find_default_container()
335+
if default_container:
336+
img_id = default_container
337+
env = cast(MutableMapping[Text, Text], os.environ)
333338

334339
if docker_req and img_id is None and kwargs.get("use_container"):
335340
raise Exception("Docker image not available")
@@ -482,8 +487,8 @@ def _job_popen(
482487
["bash", job_script.encode("utf-8")],
483488
shell=False,
484489
cwd=job_dir,
485-
stdout=subprocess.PIPE,
486-
stderr=subprocess.PIPE,
490+
stdout=sys.stderr, # The nested script will output the paths to the correct files if they need
491+
stderr=sys.stderr, # to be captured. Else just write everything to stderr (same as above).
487492
stdin=subprocess.PIPE,
488493
)
489494
if sp.stdin:

cwltool/main.py

+37-8
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313

1414
import pkg_resources # part of setuptools
1515
import requests
16+
import string
1617

1718
import ruamel.yaml as yaml
1819
import schema_salad.validate as validate
@@ -31,9 +32,11 @@
3132
relocateOutputs, scandeps, shortname, use_custom_schema,
3233
use_standard_schema)
3334
from .resolver import ga4gh_tool_registries, tool_resolver
35+
from .software_requirements import DependenciesConfiguration, get_container_from_software_requirements
3436
from .stdfsaccess import StdFsAccess
3537
from .update import ALLUPDATES, UPDATES
3638

39+
3740
_logger = logging.getLogger("cwltool")
3841

3942
defaultStreamHandler = logging.StreamHandler()
@@ -149,6 +152,15 @@ def arg_parser(): # type: () -> argparse.ArgumentParser
149152
exgroup.add_argument("--quiet", action="store_true", help="Only print warnings and errors.")
150153
exgroup.add_argument("--debug", action="store_true", help="Print even more logging")
151154

155+
# help="Dependency resolver configuration file describing how to adapt 'SoftwareRequirement' packages to current system."
156+
parser.add_argument("--beta-dependency-resolvers-configuration", default=None, help=argparse.SUPPRESS)
157+
# help="Defaut root directory used by dependency resolvers configuration."
158+
parser.add_argument("--beta-dependencies-directory", default=None, help=argparse.SUPPRESS)
159+
# help="Use biocontainers for tools without an explicitly annotated Docker container."
160+
parser.add_argument("--beta-use-biocontainers", default=None, help=argparse.SUPPRESS, action="store_true")
161+
# help="Short cut to use Conda to resolve 'SoftwareRequirement' packages."
162+
parser.add_argument("--beta-conda-dependencies", default=None, help=argparse.SUPPRESS, action="store_true")
163+
152164
parser.add_argument("--tool-help", action="store_true", help="Print command line help for tool")
153165

154166
parser.add_argument("--relative-deps", choices=['primary', 'cwd'],
@@ -236,12 +248,6 @@ def output_callback(out, processStatus):
236248
for req in jobReqs:
237249
t.requirements.append(req)
238250

239-
if kwargs.get("default_container"):
240-
t.requirements.insert(0, {
241-
"class": "DockerRequirement",
242-
"dockerPull": kwargs["default_container"]
243-
})
244-
245251
jobiter = t.job(job_order_object,
246252
output_callback,
247253
**kwargs)
@@ -648,7 +654,8 @@ def main(argsl=None, # type: List[str]
648654
'relax_path_checks': False,
649655
'validate': False,
650656
'enable_ga4gh_tool_registry': False,
651-
'ga4gh_tool_registries': []
657+
'ga4gh_tool_registries': [],
658+
'find_default_container': None
652659
}.iteritems():
653660
if not hasattr(args, k):
654661
setattr(args, k, v)
@@ -716,8 +723,20 @@ def main(argsl=None, # type: List[str]
716723
stdout.write(json.dumps(processobj, indent=4))
717724
return 0
718725

726+
conf_file = getattr(args, "beta_dependency_resolvers_configuration", None) # Text
727+
use_conda_dependencies = getattr(args, "beta_conda_dependencies", None) # Text
728+
729+
make_tool_kwds = vars(args)
730+
731+
build_job_script = None # type: Callable[[Any, List[str]], Text]
732+
if conf_file or use_conda_dependencies:
733+
dependencies_configuration = DependenciesConfiguration(args) # type: DependenciesConfiguration
734+
make_tool_kwds["build_job_script"] = dependencies_configuration.build_job_script
735+
736+
make_tool_kwds["find_default_container"] = functools.partial(find_default_container, args)
737+
719738
tool = make_tool(document_loader, avsc_names, metadata, uri,
720-
makeTool, vars(args))
739+
makeTool, make_tool_kwds)
721740

722741
if args.validate:
723742
return 0
@@ -838,5 +857,15 @@ def locToPath(p):
838857
_logger.addHandler(defaultStreamHandler)
839858

840859

860+
def find_default_container(args, builder):
861+
default_container = None
862+
if args.default_container:
863+
default_container = args.default_container
864+
elif args.beta_use_biocontainers:
865+
default_container = get_container_from_software_requirements(args, builder)
866+
867+
return default_container
868+
869+
841870
if __name__ == "__main__":
842871
sys.exit(main(sys.argv[1:]))

0 commit comments

Comments
 (0)