@@ -139,6 +139,212 @@ The easiest way to use cwltool to run a tool or workflow from Python is to use a
139
139
140
140
# result["out"] == "foo"
141
141
142
+ Leveraging SoftwareRequirements (Beta)
143
+ --------------------------------------
144
+
145
+ CWL tools may be decoarated with ``SoftwareRequirement `` hints that cwltool
146
+ may in turn use to resolve to packages in various package managers or
147
+ dependency management systems such as `Environment Modules
148
+ <http://modules.sourceforge.net/> `__.
149
+
150
+ Utilizing ``SoftwareRequirement `` hints using cwltool requires an optional
151
+ dependency, for this reason be sure to use specify the ``deps `` modifier when
152
+ installing cwltool. For instance::
153
+
154
+ $ pip install 'cwltool[deps]'
155
+
156
+ Installing cwltool in this fashion enables several new command line options.
157
+ The most general of these options is ``--beta-dependency-resolvers-configuration ``.
158
+ This option allows one to specify a dependency resolvers configuration file.
159
+ This file may be specified as either XML or YAML and very simply describes various
160
+ plugins to enable to "resolve" ``SoftwareRequirement `` dependencies.
161
+
162
+ To discuss some of these plugins and how to configure them, first consider the
163
+ following ``hint `` definition for an example CWL tool.
164
+
165
+ .. code :: yaml
166
+
167
+ SoftwareRequirement :
168
+ packages :
169
+ - package : seqtk
170
+ version :
171
+ - r93
172
+
173
+ Now imagine deploying cwltool on a cluster with Software Modules installed
174
+ and that a ``seqtk `` module is avaialble at version ``r93 ``. This means cluster
175
+ users likely won't have the ``seqtk `` the binary on their ``PATH `` by default but after
176
+ sourcing this module with the command ``modulecmd sh load seqtk/r93 `` ``seqtk `` is
177
+ available on the ``PATH ``. A simple dependency resolvers configuration file, called
178
+ ``dependency-resolvers-conf.yml `` for instance, that would enable cwltool to source
179
+ the correct module environment before executing the above tool would simply be:
180
+
181
+ .. code :: yaml
182
+
183
+ - type : module
184
+
185
+ The outer list indicates that one plugin is being enabled, the plugin parameters are
186
+ defined as a dictionary for this one list item. There is only one required parameter
187
+ for the plugin above, this is ``type `` and defines the plugin type. This parameter
188
+ is required for all plugins. The available plugins and the parameters
189
+ available for each are documented (incompletely) `here
190
+ <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html> `__.
191
+ Unfortunately, this documentation is in the context of Galaxy tool ``requirement `` s instead of CWL ``SoftwareRequirement `` s, but the concepts map fairly directly.
192
+
193
+ cwltool is distributed with an example of such seqtk tool and sample corresponding
194
+ job. It could executed from the cwltool root using a dependency resolvers
195
+ configuration file such as the above one using the command::
196
+
197
+ cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \
198
+ tests/seqtk_seq.cwl \
199
+ tests/seqtk_seq_job.json
200
+
201
+ This example demonstrates both that cwltool can leverage
202
+ existing software installations and also handle workflows with dependencies
203
+ on different versions of the same software and libraries. However the above
204
+ example does require an existing module setup so it is impossible to test this example
205
+ "out of the box" with cwltool. For a more isolated test that demonstrates all
206
+ the same concepts - the resolver plugin type ``galaxy_packages `` can be used.
207
+
208
+ "Galaxy packages" are a lighter weight alternative to Environment Modules that are
209
+ really just defined by a way to lay out directories into packages and versions
210
+ to find little scripts that are sourced to modify the environment. They have
211
+ been used for years in Galaxy community to adapt Galaxy tools to cluster
212
+ environments but require neither knowledge of Galaxy nor any special tools to
213
+ setup. These should work just fine for CWL tools.
214
+
215
+ The cwltool source code repository's test directory is setup with a very simple
216
+ directory that defines a set of "Galaxy packages" (but really just defines one
217
+ package named ``random-lines ``). The directory layout is simply::
218
+
219
+ tests/test_deps_env/
220
+ random-lines/
221
+ 1.0/
222
+ env.sh
223
+
224
+ If the ``galaxy_packages `` plugin is enabled and pointed at the
225
+ ``tests/test_deps_env `` directory in cwltool's root and a ``SoftwareRequirement ``
226
+ such as the following is encountered.
227
+
228
+ .. code :: yaml
229
+
230
+ hints :
231
+ SoftwareRequirement :
232
+ packages :
233
+ - package : ' random-lines'
234
+ version :
235
+ - ' 1.0'
236
+
237
+ Then cwltool will simply find that ``env.sh `` file and source it before executing
238
+ the corresponding tool. That ``env.sh `` script is only responsible for modifying
239
+ the job's ``PATH `` to add the required binaries.
240
+
241
+ This is a full example that works since resolving "Galaxy packages" has no
242
+ external requirements. Try it out by executing the following command from cwltool's
243
+ root directory::
244
+
245
+ cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \
246
+ tests/random_lines.cwl \
247
+ tests/random_lines_job.json
248
+
249
+ The resolvers configuration file in the above example was simply:
250
+
251
+ .. code :: yaml
252
+
253
+ - type : galaxy_packages
254
+ base_path : ./tests/test_deps_env
255
+
256
+ It is possible that the ``SoftwareRequirement `` s in a given CWL tool will not
257
+ match the module names for a given cluster. Such requirements can be re-mapped
258
+ to specific deployed packages and/or versions using another file specified using
259
+ the resolver plugin parameter `mapping_files `. We will
260
+ demonstrate this using `galaxy_packages ` but the concepts apply equally well
261
+ to Environment Modules or Conda packages (described below) for instance.
262
+
263
+ So consider the resolvers configuration file
264
+ (`tests/test_deps_env_resolvers_conf_rewrite.yml `):
265
+
266
+ .. code :: yaml
267
+
268
+ - type : galaxy_packages
269
+ base_path : ./tests/test_deps_env
270
+ mapping_files : ./tests/test_deps_mapping.yml
271
+
272
+ And the corresponding mapping configuraiton file (`tests/test_deps_mapping.yml `):
273
+
274
+ .. code :: yaml
275
+
276
+ - from :
277
+ name : randomLines
278
+ version : 1.0.0-rc1
279
+ to :
280
+ name : random-lines
281
+ version : ' 1.0'
282
+
283
+ This is saying if cwltool encounters a requirement of ``randomLines `` at version
284
+ ``1.0.0-rc1 `` in a tool, to rewrite to our specific plugin as ``random-lines `` at
285
+ version ``1.0 ``. cwltool has such a test tool called ``random_lines_mapping.cwl ``
286
+ that contains such a source ``SoftwareRequirement ``. To try out this example with
287
+ mapping, execute the following command from the cwltool root directory::
288
+
289
+ cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \
290
+ tests/random_lines_mapping.cwl \
291
+ tests/random_lines_job.json
292
+
293
+ The previous examples demonstrated leveraging existing infrastructure to
294
+ provide requirements for CWL tools. If instead a real package manager is used
295
+ cwltool has the oppertunity to install requirements as needed. While initial
296
+ support for Homebrew/Linuxbrew plugins is available, the most developed such
297
+ plugin is for the `Conda <https://conda.io/docs/# >`__ package manager. Conda has the nice properties
298
+ of allowing multiple versions of a package to be installed simultaneously,
299
+ not requiring evalated permissions to install Conda itself or packages using
300
+ Conda, and being cross platform. For these reasons, cwltool may run as a normal
301
+ user, install its own Conda environment and manage multiple versions of Conda packages
302
+ on both Linux and Mac OS X.
303
+
304
+ The Conda plugin can be endlessly configured, but a sensible set of defaults
305
+ that has proven a powerful stack for dependency management within the Galaxy tool
306
+ development ecosystem can be enabled by simply passing cwltool the
307
+ ``--beta-conda-dependencies `` flag.
308
+
309
+ With this we can use the seqtk example above without Docker and without
310
+ any externally managed services - cwltool should install everything it needs
311
+ and create an environment for the tool. Try it out with the follwing command::
312
+
313
+ cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
314
+
315
+ The CWL specification allows URIs to be attached to ``SoftwareRequirement `` s
316
+ that allow disambiguation of package names. If the mapping files described above
317
+ allow deployers to adapt tools to their infrastructure, this mechanism allows
318
+ tools to adapt their requirements to multiple package managers. To demonstrate
319
+ this within the context of the seqtk, we can simply break the package name we
320
+ use and then specify a specific Conda package as follows:
321
+
322
+ .. code :: yaml
323
+
324
+ hints :
325
+ SoftwareRequirement :
326
+ packages :
327
+ - package : seqtk_seq
328
+ version :
329
+ - ' 1.2'
330
+ specs :
331
+ - https://anaconda.org/bioconda/seqtk
332
+ - https://packages.debian.org/sid/seqtk
333
+
334
+ The example can be executed using the command::
335
+
336
+ cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json
337
+
338
+ The plugin framework for managing resolution of these software requirements
339
+ as maintained as part of `galaxy-lib <https://github.com/galaxyproject/galaxy-lib >`__ - a small, portable subset of the Galaxy
340
+ project. More information on configuration and implementation can be found
341
+ at the following links:
342
+
343
+ - `Dependency Resolvers in Galaxy <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html >`__
344
+ - `Conda for [Galaxy] Tool Dependencies <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html >`__
345
+ - `Mapping Files - Implementation <https://github.com/galaxyproject/galaxy/commit/495802d229967771df5b64a2f79b88a0eaf00edb >`__
346
+ - `Specifications - Implementation <https://github.com/galaxyproject/galaxy/commit/81d71d2e740ee07754785306e4448f8425f890bc >`__
347
+ - `Initial cwltool Integration Pull Request <https://github.com/common-workflow-language/cwltool/pull/214 >`__
142
348
143
349
Cwltool control flow
144
350
--------------------
0 commit comments