Skip to content

Commit f32e77e

Browse files
authored
Add a batch-create management command (#1509)
Signed-off-by: tdruez <[email protected]>
1 parent cf651f1 commit f32e77e

File tree

10 files changed

+417
-29
lines changed

10 files changed

+417
-29
lines changed

CHANGELOG.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ v34.9.4 (unreleased)
3131
The labels are now always presented in alphabetical order for consistency.
3232
https://github.com/aboutcode-org/scancode.io/issues/1520
3333

34+
- Add a ``batch-create`` management command that allows to create multiple projects
35+
at once from a directory containing input files.
36+
https://github.com/aboutcode-org/scancode.io/issues/1437
37+
3438
v34.9.3 (2024-12-31)
3539
--------------------
3640

docs/command-line-interface.rst

Lines changed: 91 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ ScanPipe's own commands are listed under the ``[scanpipe]`` section::
5757
add-input
5858
add-pipeline
5959
archive-project
60+
batch-create
6061
check-compliance
6162
create-project
6263
create-user
@@ -83,7 +84,8 @@ For example::
8384
$ scanpipe create-project --help
8485
usage: scanpipe create-project [--input-file INPUTS_FILES]
8586
[--input-url INPUT_URLS] [--copy-codebase SOURCE_DIRECTORY]
86-
[--pipeline PIPELINES] [--execute] [--async]
87+
[--pipeline PIPELINES] [--label LABELS] [--notes NOTES]
88+
[--execute] [--async]
8789
name
8890

8991
Create a ScanPipe project.
@@ -124,6 +126,10 @@ Optional arguments:
124126
- ``--copy-codebase SOURCE_DIRECTORY`` Copy the content of the provided source directory
125127
into the :guilabel:`codebase/` work directory.
126128

129+
- ``--notes NOTES`` Optional notes about the project.
130+
131+
- ``--label LABELS`` Optional labels for the project.
132+
127133
- ``--execute`` Execute the pipelines right after project creation.
128134

129135
- ``--async`` Add the pipeline run to the tasks queue for execution by a worker instead
@@ -133,6 +139,90 @@ Optional arguments:
133139
.. warning::
134140
Pipelines are added and are executed in order.
135141

142+
.. _cli_batch_create:
143+
144+
`$ scanpipe batch-create [--input-directory INPUT_DIRECTORY] [--input-list FILENAME.csv]`
145+
-----------------------------------------------------------------------------------------
146+
147+
Processes files from the specified ``INPUT_DIRECTORY`` or rows from ``FILENAME.csv``,
148+
creating a project for each file or row.
149+
150+
- Use ``--input-directory`` to specify a local directory. Each file in the directory
151+
will result in a project, uniquely named using the filename and a timestamp.
152+
153+
- Use ``--input-list`` to specify a ``FILENAME.csv``. Each row in the CSV will be used
154+
to create a project based on the data provided.
155+
156+
Supports specifying pipelines and asynchronous execution.
157+
158+
Required arguments (one of):
159+
160+
- ``input-directory`` The path to the directory containing the input files to process.
161+
Ensure the directory exists and contains the files you want to use.
162+
163+
- ``input-list`` Path to a CSV file with project names and input URLs.
164+
The first column must contain project names, and the second column should list
165+
comma-separated input URLs (e.g., Download URL, PURL, or Docker reference).
166+
167+
**CSV content example**:
168+
169+
+----------------+---------------------------------+
170+
| project_name | input_urls |
171+
+================+=================================+
172+
| project-1 | https://url.com/file.ext |
173+
+----------------+---------------------------------+
174+
| project-2 | pkg:deb/debian/[email protected] |
175+
+----------------+---------------------------------+
176+
177+
Optional arguments:
178+
179+
- ``--project-name-suffix`` Optional custom suffix to append to project names.
180+
If not provided, a timestamp (in the format [YYMMDD_HHMMSS]) will be used.
181+
182+
- ``--pipeline PIPELINES`` Pipelines names to add on the project.
183+
184+
- ``--notes NOTES`` Optional notes about the project.
185+
186+
- ``--label LABELS`` Optional labels for the project.
187+
188+
- ``--execute`` Execute the pipelines right after project creation.
189+
190+
- ``--async`` Add the pipeline run to the tasks queue for execution by a worker instead
191+
of running in the current thread.
192+
Applies only when ``--execute`` is provided.
193+
194+
Example: Processing Multiple Docker Images
195+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
196+
197+
Assume multiple Docker images are available in a directory named ``local-data/`` on
198+
the host machine.
199+
To process these images with the ``analyze_docker_image`` pipeline using asynchronous
200+
execution::
201+
202+
$ docker compose run --rm \
203+
--volume local-data/:/input-data:ro \
204+
web scanpipe batch-create input-data/ \
205+
--pipeline analyze_docker_image \
206+
--label "Docker" \
207+
--execute --async
208+
209+
**Explanation**:
210+
211+
- ``local-data/``: A directory on the host machine containing the Docker images to
212+
process.
213+
- ``/input-data/``: The directory inside the container where ``local-data/`` is
214+
mounted (read-only).
215+
- ``--pipeline analyze_docker_image``: Specifies the ``analyze_docker_image``
216+
pipeline for processing each Docker image.
217+
- ``--label "Docker"``: Tagging all the projects with the "Docker" label to enable
218+
easy search and filtering.
219+
- ``--execute``: Runs the pipeline immediately after creating a project for each
220+
image.
221+
- ``--async``: Adds the pipeline run to the worker queue for asynchronous execution.
222+
223+
Each Docker image in the ``local-data/`` directory will result in the creation of a
224+
project with the specified pipeline (``analyze_docker_image``) executed by worker
225+
services.
136226

137227
`$ scanpipe list-pipeline [--verbosity {0,1,2,3}]`
138228
--------------------------------------------------

docs/faq.rst

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,33 @@ It does not compute such summary.
108108
You can also have a look at the different steps for each pipeline from the
109109
:ref:`built_in_pipelines` documentation.
110110

111+
How to create multiple projects at once?
112+
-----------------------------------------
113+
114+
You can use the :ref:`cli_batch_create` command to create multiple projects
115+
simultaneously.
116+
This command processes all files in a specified input directory, creating one project
117+
per file.
118+
Each project is uniquely named using the file name and a timestamp by default.
119+
120+
For example, to create multiple projects from files in a directory named
121+
``local-data/``::
122+
123+
$ docker compose run --rm \
124+
--volume local-data/:/input-data:ro \
125+
web scanpipe batch-create input-data/
126+
127+
**Options**:
128+
129+
- **Custom Pipelines**: Use the ``--pipeline`` option to add specific pipelines to the
130+
projects.
131+
- **Asynchronous Execution**: Add ``--execute`` and ``--async`` to queue pipeline
132+
execution for worker processing.
133+
- **Project Notes and Labels**: Use ``--notes`` and ``--label`` to include metadata.
134+
135+
Each file in the input directory will result in the creation of a corresponding project,
136+
ready for pipeline execution.
137+
111138
Can I run multiple pipelines in parallel?
112139
-----------------------------------------
113140

@@ -279,7 +306,7 @@ data older than 7 days::
279306
See :ref:`command_line_interface` chapter for more information about the scanpipe
280307
command.
281308

282-
How can I provide my license policies ?
309+
How can I provide my license policies?
283310
---------------------------------------
284311

285312
For detailed information about the policies system, refer to :ref:`policies`.

scanpipe/management/commands/__init__.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,28 @@ def display_status(self, project, verbosity):
150150
self.stdout.write(line)
151151

152152

153+
class PipelineCommandMixin:
154+
def add_arguments(self, parser):
155+
super().add_arguments(parser)
156+
parser.add_argument(
157+
"--pipeline",
158+
action="append",
159+
dest="pipelines",
160+
default=list(),
161+
help=(
162+
"Pipelines names to add to the project. "
163+
"The pipelines are added and executed based on their given order. "
164+
'Groups can be provided using the "pipeline_name:option1,option2" '
165+
"syntax."
166+
),
167+
)
168+
parser.add_argument(
169+
"--execute",
170+
action="store_true",
171+
help="Execute the pipelines right after the project creation.",
172+
)
173+
174+
153175
class AddInputCommandMixin:
154176
def add_arguments(self, parser):
155177
super().add_arguments(parser)
@@ -427,6 +449,7 @@ def create_project(
427449
input_urls=None,
428450
copy_from="",
429451
notes="",
452+
labels=None,
430453
execute=False,
431454
run_async=False,
432455
command=None,
@@ -451,6 +474,10 @@ def create_project(
451474
)
452475

453476
project.save()
477+
478+
if labels:
479+
project.labels.add(*labels)
480+
454481
if command:
455482
command.project = project
456483

@@ -491,6 +518,20 @@ def execute_project(self, run_async=False):
491518

492519

493520
class CreateProjectCommandMixin(ExecuteProjectCommandMixin):
521+
def add_arguments(self, parser):
522+
super().add_arguments(parser)
523+
parser.add_argument(
524+
"--notes",
525+
help="Optional notes about the project.",
526+
)
527+
parser.add_argument(
528+
"--label",
529+
action="append",
530+
dest="labels",
531+
default=list(),
532+
help="Optional labels for the project.",
533+
)
534+
494535
def create_project(
495536
self,
496537
name,
@@ -499,16 +540,21 @@ def create_project(
499540
input_urls=None,
500541
copy_from="",
501542
notes="",
543+
labels=None,
502544
execute=False,
503545
run_async=False,
504546
):
547+
if execute and not pipelines:
548+
raise CommandError("The --execute option requires one or more pipelines.")
549+
505550
return create_project(
506551
name=name,
507552
pipelines=pipelines,
508553
input_files=input_files,
509554
input_urls=input_urls,
510555
copy_from=copy_from,
511556
notes=notes,
557+
labels=labels,
512558
execute=execute,
513559
run_async=run_async,
514560
command=self,

0 commit comments

Comments
 (0)