-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC Set cachedir and backupcachedir as parameter for parallel instances of the tool with own database #4773
base: main
Are you sure you want to change the base?
Conversation
I think this is probably a change we should have. BUT for parallel working, our standard recommendation is that you not allow scan jobs to update the database to avoid this problem, so you might want to switch to doing that. That is:
Our docs actually recommend you use
I'm wondering if we should make some changes to make this happen better or make it more obvious to users that this is the recommended solution. I'll open an issue so someone could maybe work on that. |
Hi @terriko, The other think is, that you can not set an other path to the cve db, at the moment. The different cve sources and files use always the compiled in default path to the ~/.cache/cve-bin-tool/cve.db. You can not run a pipeline with its own cve.db in the workspace. If you make a copy to the workspace and use CVEDB, then it depends from the data request, which database path is in use. If no database in the ~/.cache/cve-bin-tool/cve.db, then it crashs e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jenkins would be a bit different, but we're using the cache functionality of github actions:
https://github.com/intel/cve-bin-tool/blob/main/.github/workflows/update-cache.yml
https://github.com/intel/cve-bin-tool/actions/workflows/update-cache.yml
So the update job does the update and then clobbers the cache, and the scan jobs each make a copy of the cache to their local directory before they start. Because it's github actions which uses new containers for every job, the individual scan/test jobs can't share data any other way.
That probably doesn't help for the case where you're reusing the same machine/container instead of fresh ones, but it probably explains why the cache directory hasn't been fixed yet.
Anyhow, back to this PR -- do you have time to fix the linter issues? You probably need to run isort
and black
and just let them auto-fix. It would probably be good to update the branch at the same time, since we made some CI changes that will probably make the tests run faster for you. I can do that through the web interface but I don't want to make your local branch all out of sync if you're going to fix the linter issues.
Oh, forgot the link for how to run the linters and what they're doing: |
The cachedir and backupcachedir was always on the default value for the data sources. For the cvedb is the cachedir configurable. With this patch the cve-bin-tool can run in different instances with a own cache for the CVE information. Additional the patch is a workaround for the cvedb access in line 441 purl2cpe_conn = sqlite3.connect(self.cachedir / "purl2cpe/purl2cpe.db"), which fails if cvedb has an other cachedir as the purl2cpe with the DISK_LOCATION_DEFAULT. Signed-off-by: Maik Otto <[email protected]>
57585c0
to
7ec5658
Compare
Hi @terriko |
The cachedir and backup_cachedir is always on the default value for the data sources. (~/.cache/cve-bin-tool)
For the cvedb is the cachedir configurable. In the data_sources, the cve_scanner or others use always the default cachedir path.
If you set the cachedir for the CVEDB to an other, then the cvedb access in line 441
purl2cpe_conn = sqlite3.connect(self.cachedir / "purl2cpe/purl2cpe.db")
fails, because the sql connection for purl2cpe use the DISK_LOCATION_DEFAULT instead of the self.cachedir of the cvedb.
The motivation for this RFC patch is to run different instances of the cve-bin-tool with their own cachedir for the instances.
In parallel operation, there is actually a risk of collisions when accessing the database if a task starts later and wants to update the database. Then the first task makes a rollback of the database, which corrupt the complete database.
This RFC patch try to set the cachedir in all affected files, but there are many affected files and dependencies.
I am not sure, if this is the right approach to realize a better parallel processing.