Skip to content

Commit 5522e88

Browse files
committed
Docs for resolving SoftwareRequirements.
1 parent 75c8d81 commit 5522e88

File tree

2 files changed

+208
-2
lines changed

2 files changed

+208
-2
lines changed

README.rst

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,212 @@ The easiest way to use cwltool to run a tool or workflow from Python is to use a
139139

140140
# result["out"] == "foo"
141141

142+
Leveraging SoftwareRequirements (Beta)
143+
--------------------------------------
144+
145+
CWL tools may be decoarated with ``SoftwareRequirement`` hints that cwltool
146+
may in turn use to resolve to packages in various package managers or
147+
dependency management systems such as `Environment Modules
148+
<http://modules.sourceforge.net/>`__.
149+
150+
Utilizing ``SoftwareRequirement`` hints using cwltool requires an optional
151+
dependency, for this reason be sure to use specify the ``deps`` modifier when
152+
installing cwltool. For instance::
153+
154+
$ pip install 'cwltool[deps]'
155+
156+
Installing cwltool in this fashion enables several new command line options.
157+
The most general of these options is ``--beta-dependency-resolvers-configuration``.
158+
This option allows one to specify a dependency resolvers configuration file.
159+
This file may be specified as either XML or YAML and very simply describes various
160+
plugins to enable to "resolve" ``SoftwareRequirement`` dependencies.
161+
162+
To discuss some of these plugins and how to configure them, first consider the
163+
following ``hint`` definition for an example CWL tool.
164+
165+
.. code:: yaml
166+
167+
SoftwareRequirement:
168+
packages:
169+
- package: seqtk
170+
version:
171+
- r93
172+
173+
Now imagine deploying cwltool on a cluster with Software Modules installed
174+
and that a ``seqtk`` module is avaialble at version ``r93``. This means cluster
175+
users likely won't have the ``seqtk`` the binary on their ``PATH`` by default but after
176+
sourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is
177+
available on the ``PATH``. A simple dependency resolvers configuration file, called
178+
``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source
179+
the correct module environment before executing the above tool would simply be:
180+
181+
.. code:: yaml
182+
183+
- type: module
184+
185+
The outer list indicates that one plugin is being enabled, the plugin parameters are
186+
defined as a dictionary for this one list item. There is only one required parameter
187+
for the plugin above, this is ``type`` and defines the plugin type. This parameter
188+
is required for all plugins. The available plugins and the parameters
189+
available for each are documented (incompletely) `here
190+
<https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__.
191+
Unfortunately, this documentation is in the context of Galaxy tool ``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly.
192+
193+
cwltool is distributed with an example of such seqtk tool and sample corresponding
194+
job. It could executed from the cwltool root using a dependency resolvers
195+
configuration file such as the above one using the command::
196+
197+
cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \
198+
tests/seqtk_seq.cwl \
199+
tests/seqtk_seq_job.json
200+
201+
This example demonstrates both that cwltool can leverage
202+
existing software installations and also handle workflows with dependencies
203+
on different versions of the same software and libraries. However the above
204+
example does require an existing module setup so it is impossible to test this example
205+
"out of the box" with cwltool. For a more isolated test that demonstrates all
206+
the same concepts - the resolver plugin type ``galaxy_packages`` can be used.
207+
208+
"Galaxy packages" are a lighter weight alternative to Environment Modules that are
209+
really just defined by a way to lay out directories into packages and versions
210+
to find little scripts that are sourced to modify the environment. They have
211+
been used for years in Galaxy community to adapt Galaxy tools to cluster
212+
environments but require neither knowledge of Galaxy nor any special tools to
213+
setup. These should work just fine for CWL tools.
214+
215+
The cwltool source code repository's test directory is setup with a very simple
216+
directory that defines a set of "Galaxy packages" (but really just defines one
217+
package named ``random-lines``). The directory layout is simply::
218+
219+
tests/test_deps_env/
220+
random-lines/
221+
1.0/
222+
env.sh
223+
224+
If the ``galaxy_packages`` plugin is enabled and pointed at the
225+
``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement``
226+
such as the following is encountered.
227+
228+
.. code:: yaml
229+
230+
hints:
231+
SoftwareRequirement:
232+
packages:
233+
- package: 'random-lines'
234+
version:
235+
- '1.0'
236+
237+
Then cwltool will simply find that ``env.sh`` file and source it before executing
238+
the corresponding tool. That ``env.sh`` script is only responsible for modifying
239+
the job's ``PATH`` to add the required binaries.
240+
241+
This is a full example that works since resolving "Galaxy packages" has no
242+
external requirements. Try it out by executing the following command from cwltool's
243+
root directory::
244+
245+
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \
246+
tests/random_lines.cwl \
247+
tests/random_lines_job.json
248+
249+
The resolvers configuration file in the above example was simply:
250+
251+
.. code:: yaml
252+
253+
- type: galaxy_packages
254+
base_path: ./tests/test_deps_env
255+
256+
It is possible that the ``SoftwareRequirement`` s in a given CWL tool will not
257+
match the module names for a given cluster. Such requirements can be re-mapped
258+
to specific deployed packages and/or versions using another file specified using
259+
the resolver plugin parameter `mapping_files`. We will
260+
demonstrate this using `galaxy_packages` but the concepts apply equally well
261+
to Environment Modules or Conda packages (described below) for instance.
262+
263+
So consider the resolvers configuration file
264+
(`tests/test_deps_env_resolvers_conf_rewrite.yml`):
265+
266+
.. code:: yaml
267+
268+
- type: galaxy_packages
269+
base_path: ./tests/test_deps_env
270+
mapping_files: ./tests/test_deps_mapping.yml
271+
272+
And the corresponding mapping configuraiton file (`tests/test_deps_mapping.yml`):
273+
274+
.. code:: yaml
275+
276+
- from:
277+
name: randomLines
278+
version: 1.0.0-rc1
279+
to:
280+
name: random-lines
281+
version: '1.0'
282+
283+
This is saying if cwltool encounters a requirement of ``randomLines`` at version
284+
``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at
285+
version ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl``
286+
that contains such a source ``SoftwareRequirement``. To try out this example with
287+
mapping, execute the following command from the cwltool root directory::
288+
289+
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \
290+
tests/random_lines_mapping.cwl \
291+
tests/random_lines_job.json
292+
293+
The previous examples demonstrated leveraging existing infrastructure to
294+
provide requirements for CWL tools. If instead a real package manager is used
295+
cwltool has the oppertunity to install requirements as needed. While initial
296+
support for Homebrew/Linuxbrew plugins is available, the most developed such
297+
plugin is for the `Conda <https://conda.io/docs/#>`__ package manager. Conda has the nice properties
298+
of allowing multiple versions of a package to be installed simultaneously,
299+
not requiring evalated permissions to install Conda itself or packages using
300+
Conda, and being cross platform. For these reasons, cwltool may run as a normal
301+
user, install its own Conda environment and manage multiple versions of Conda packages
302+
on both Linux and Mac OS X.
303+
304+
The Conda plugin can be endlessly configured, but a sensible set of defaults
305+
that has proven a powerful stack for dependency management within the Galaxy tool
306+
development ecosystem can be enabled by simply passing cwltool the
307+
``--beta-conda-dependencies`` flag.
308+
309+
With this we can use the seqtk example above without Docker and without
310+
any externally managed services - cwltool should install everything it needs
311+
and create an environment for the tool. Try it out with the follwing command::
312+
313+
cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
314+
315+
The CWL specification allows URIs to be attached to ``SoftwareRequirement`` s
316+
that allow disambiguation of package names. If the mapping files described above
317+
allow deployers to adapt tools to their infrastructure, this mechanism allows
318+
tools to adapt their requirements to multiple package managers. To demonstrate
319+
this within the context of the seqtk, we can simply break the package name we
320+
use and then specify a specific Conda package as follows:
321+
322+
.. code:: yaml
323+
324+
hints:
325+
SoftwareRequirement:
326+
packages:
327+
- package: seqtk_seq
328+
version:
329+
- '1.2'
330+
specs:
331+
- https://anaconda.org/bioconda/seqtk
332+
- https://packages.debian.org/sid/seqtk
333+
334+
The example can be executed using the command::
335+
336+
cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json
337+
338+
The plugin framework for managing resolution of these software requirements
339+
as maintained as part of `galaxy-lib <https://github.com/galaxyproject/galaxy-lib>`__ - a small, portable subset of the Galaxy
340+
project. More information on configuration and implementation can be found
341+
at the following links:
342+
343+
- `Dependency Resolvers in Galaxy <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__
344+
- `Conda for [Galaxy] Tool Dependencies <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html>`__
345+
- `Mapping Files - Implementation <https://github.com/galaxyproject/galaxy/commit/495802d229967771df5b64a2f79b88a0eaf00edb>`__
346+
- `Specifications - Implementation <https://github.com/galaxyproject/galaxy/commit/81d71d2e740ee07754785306e4448f8425f890bc>`__
347+
- `Initial cwltool Integration Pull Request <https://github.com/common-workflow-language/cwltool/pull/214>`__
142348

143349
Cwltool control flow
144350
--------------------
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11

2-
#PACKAGE_DIRECTORY="/path/to/cwlroot/tests/test_deps_env/random-lines/1.0/"
2+
#PACKAGE_DIRECTORY="/path/to/cwlroot"
33

44
# This shouldn't need to use bash-isms - but we don't know the full path to this file,
55
# so for testing it is setup this way. For actual deployments just using full paths
66
# directly would be preferable.
7-
PACKAGE_DIRECTORY="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
7+
PACKAGE_DIRECTORY="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/tests/test_deps_env/random-lines/1.0/"
88
export PATH=$PATH:$PACKAGE_DIRECTORY/scripts

0 commit comments

Comments
 (0)