Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to the file method to check for md5sum #2491

Open
JoseEspinosa opened this issue Dec 14, 2021 · 5 comments · May be fixed by #4415 or #5807
Open

Add an option to the file method to check for md5sum #2491

JoseEspinosa opened this issue Dec 14, 2021 · 5 comments · May be fixed by #4415 or #5807

Comments

@JoseEspinosa
Copy link
Contributor

After discussion on the nf-core slack, we (@ewels, @mahesh-panchal) think that it would be useful to add a native option to the file method to check the integrity of the files that are staged.
Using the example shown in the documentation here, an option that might be named checksum should allow providing the hash in a similar manner to the code below:

pdb = file('http://files.rcsb.org/header/5FID.pdb', checksum: 'ba45addcc599af2ac71492f0f55da866')

The idea will be that the hash of the file is calculated either if the file is staged or if it is already present in the cage by a previous execution and that if the hash does not match the provided by the user an exception is raised, similarly to what happens when checkIfExists option is set to true and the file is not found in the system.

@YPHa
Copy link

YPHa commented Feb 1, 2022

I was recently looking for such functionality. But files from Ensembl should be hashed using "sum" instead of md5sum. So some flexibility in this regard would also be highly appreciated (maybe with an additional argument on which kind of hash to make?)

@stale
Copy link

stale bot commented Jul 10, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stevekm
Copy link
Contributor

stevekm commented Oct 16, 2023

to save a little time, might consider using something faster like sha1 instead of md5 for this

@bentsherman
Copy link
Member

We might need to support both anyway since the checksums might be provided in either format

@jordeu
Copy link
Collaborator

jordeu commented Oct 16, 2023

I'll add support for md5, sha256 and sha1

@bentsherman bentsherman linked a pull request Feb 21, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants