Remove UR:file:// and UR:ftp:// from ref search path, plus REF_PATH to EBI #1881
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While use of the EBI refget server was originally encouraged by the CRAM inventors, it became a self-imposed DDOS and it is now unreliable due to explicit rate-limiting by the EBI. This removes EBI as a fallback when REF_PATH has not been set.
In doing this we discovered that we could still retrieve references (ironically also from EBI due to the test being a 1000genomes CRAM) via the SQ UR: tag supporting remote URIs. This behaviour is explicitly listed as not being supported in the samtools manpage and we believe it was an accidental ability added when switching from
fopen
tobgzf_open
for reading the UR reference file.Note this check must be in
cram_populate_ref
and notload_ref_portion
orbgzf_open_ref
as the user still has the ability to explicitly request an external reference, eg via "samtools view -T URI".open_path_mfile()
now takes an extra 'int *local' argument which is filled out with non-zero if the file found in REF_PATH is local. Non-local files will be cached to REF_CACHE if set, but it no longer has a default value as we did when ebi refget was the default REF_PATH. This means it should operate much as before, except for the lack of EBI defaults.