Skip to content

Commit f6ac1c2

Browse files
daviesrobpd3
authored andcommitted
Pack more records in memory when sorting
As bcf1_t is quite a big structure, it adds quite a lot of overhead if the records being sorted are small (e.g. single sample gVCF). This overhead can be reduced by storing the data in a more compact form. Variable-length encoding is used for numbers that aren't directly needed for sorting as values are usually much smaller than the maximum possible. On a test file with approx. 61 characters per VCF line, up to four times as many records could be stored before having to spill them. This change only affects the blocks of data sorted in memory and then written out by buf_flush(). As the merge_blocks() function writes bcf and needs far fewer records in memory at any time, partially merged files are still written in that format.
1 parent a62defa commit f6ac1c2

File tree

2 files changed

+313
-101
lines changed

2 files changed

+313
-101
lines changed

Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -259,7 +259,7 @@ vcfroh.o: vcfroh.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_kstrin
259259
vcfcnv.o: vcfcnv.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_kstring_h) $(htslib_kfunc_h) $(htslib_khash_str2int_h) $(bcftools_h) HMM.h rbuf.h
260260
vcfhead.o: vcfhead.c $(htslib_kstring_h) $(htslib_vcf_h) $(bcftools_h)
261261
vcfsom.o: vcfsom.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(htslib_hts_os_h) $(bcftools_h)
262-
vcfsort.o: vcfsort.c $(htslib_vcf_h) $(htslib_kstring_h) $(htslib_hts_os_h) kheap.h $(bcftools_h)
262+
vcfsort.o: vcfsort.c $(htslib_vcf_h) $(htslib_kstring_h) $(htslib_hts_os_h) $(htslib_bgzf_h) kheap.h $(bcftools_h)
263263
vcfstats.o: vcfstats.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(htslib_faidx_h) $(bcftools_h) $(filter_h) bin.h dist.h
264264
vcfview.o: vcfview.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(bcftools_h) $(filter_h) $(htslib_khash_str2int_h) $(htslib_kbitset_h)
265265
reheader.o: reheader.c $(htslib_vcf_h) $(htslib_bgzf_h) $(htslib_tbx_h) $(htslib_kseq_h) $(htslib_thread_pool_h) $(htslib_faidx_h) $(htslib_khash_str2int_h) $(bcftools_h) $(khash_str2str_h)

0 commit comments

Comments
 (0)