-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CheckM2 Data Manager writes to CheckM2 install dir #6717
Comments
I can confirm that a modified Fetched the DB using a writable conda install of checkm2: $ checkm2 database --download --path $(pwd)/data
[02/04/2025 12:34:01 PM] INFO: Command: Download database. Checking internal path information.
[02/04/2025 12:34:03 PM] INFO: Downloading https://zenodo.org/api/records/5571251/files/checkm2_database.tar.gz/content to /home/nate/xplor/checkm2/data/checkm2_database.tar.gz.
100%|████████| 1.74G/1.74G [01:05<00:00, 26.4MiB/s]
[02/04/2025 12:35:09 PM] INFO: Extracting files from archive...
[02/04/2025 12:35:24 PM] INFO: Verifying version and checksums...
[02/04/2025 12:35:24 PM] INFO: Verification success.
[02/04/2025 12:35:25 PM] INFO: Diamond DATABASE downloaded successfully! Consider running <checkm2 testrun> to verify everything works. The json in the conda env just contains the path to the database: $ cat ~/.condas/24.11/envs/[email protected]/lib/python3.8/site-packages/checkm2/version/diamond_path.json
{"Type": "DIAMONDDB", "DBPATH": "/home/nate/xplor/checkm2/data/CheckM2_database/uniref100.KO.1.dmnd"}% But this appears to be ignored when $ apptainer -s exec --cleanenv -B $(pwd)/data:/data:ro -B $(pwd)/input:/input:ro -B $(pwd)/output:/output /cvmfs/singularity.galaxyproject.org/all/checkm2:1.0.2--pyh7cba7a3_0 checkm2 predict --input /input --allmodels --genes --ttable 13 -x .faa --threads 1 --database_path /data/CheckM2_database/uniref100.KO.1.dmnd --output-directory /output
[02/04/2025 12:48:21 PM] INFO: Running CheckM2 version 1.0.2
[02/04/2025 12:48:21 PM] INFO: Custom database path provided for predict run. Checking database at /data/CheckM2_database/uniref100.KO.1.dmnd...
[02/04/2025 12:48:23 PM] INFO: Running quality prediction workflow with 1 threads.
[02/04/2025 12:48:24 PM] INFO: Using user-supplied protein files.
[02/04/2025 12:48:24 PM] INFO: Calculating metadata for 2 bins with 1 threads:
Finished processing 2 of 2 (100.00%) bin metadata.
[02/04/2025 12:48:25 PM] INFO: Annotating input genomes with DIAMOND using 1 threads
[02/04/2025 12:50:07 PM] INFO: Processing DIAMOND output
[02/04/2025 12:50:07 PM] INFO: Predicting completeness and contamination using ML models.
[02/04/2025 12:50:10 PM] INFO: Parsing all results and constructing final output table.
[02/04/2025 12:50:10 PM] INFO: CheckM2 finished successfully. |
Is there an upstream issue? It's hardcoded here: Wondering if we can hack this by copying the module to the working dir and prepending it to PYTHONPATH. |
Alternatively: can we ignore the error using stdio? |
Yes the issue is upstream - imo we should just wait and only implement our own workaround if there isn't an upstream fix. I already worked around it (run in conda, throw the env away) for my own use so there is no urgency from me. |
We also can change the DM and use this tool for example: https://github.com/dvolgyes/zenodo_get CheckM2 downloads it from Zenodo too and we can use the recordID, which is for the current version 5571251, to download the file and unzip it since it is a tar.gz file. Maybe this is a good solution till they fix it? I can write a conda recipe and test this tool out with the current DM since there is not a lot to adjust in the current wrapper if this is wanted :) Zenodo also has an API but how well this works I can not tell for a solution. |
Hi, author of CheckM2 here - what's your preferred alternative? The original reason for the hardcoding was to enable easy central installation of a conda environment where lots of users can activate it (our main use case in our lab) and utilise the tool without needing to have their own config file for the database path (which yes, can just be bypassed). The idea was the admin makes the initial changes (the database has to be downloaded anyway for CheckM2 to work), modifies the path in the json file in the install directory, the tool can then be used by anyone by simply calling 'conda activate central_checkm2_environment' with no further work required from them. What's the best alternative for your use-case? Move the json to the user's home dir? Have the admin install the database in a different directory somewhere, then export a CHECKM2_DB upon conda env activation? Other? Currently updating for a new release, so happy to incorporate suggested changes so it works for you as well. |
Anything that avoids the non-zero exit code would be fine for us. I guess the easiest option would be to catch the Alternatively add a flag that just disables the writing of the json file. This might be even cleaner. |
Environment variable is not really necessary as long as |
This is not the DM's fault, it's CheckM2 doing it, and as far as I can tell there's no way to override it. However, it makes running with Singularity impossible and with Conda unadvisable (since it modifies your installation).
I'll look at this, although even if I can prevent it from writing to the internal
diamond_path.json
, which does exist:I don't know if the tool will work if this file isn't updated.
The text was updated successfully, but these errors were encountered: