Skip to content

Explain parameters #111

@martindurant

Description

@martindurant

Thanks for putting this together! The kerchunk will make great use of it.

I am still trying to get my head around how it works, given that "gzip/zlib streams are unsplittable" has been matra for a long time.

In this issue, however, I'd like to ask for more documentation around the arguments to IndexedGzipFile, and the tradeoffs they entail:

  • I understand spacing: the more points in the file you index, the better random seeks will tend to be (needing less scrolling), but the bigger the index file will get. I expect this can be any number up to the size of the target file, at which point seeking is equivalent to not using indexed_gzip at all
  • window_size: something to do with how much data is stored with each point? Can it be made small to keep the index file small, and what would be the downside of this? I don't seem to be able to pick just any number without ZranError, is 2**15 the minimum, or is this file dependent?
  • readbuf_size: if I know that I will always be reading an exact byte range every time or I implement buffering elsewhere, can this be zero?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions