Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LongListDisk.writeLongsData() may corrupt data in snapshots #18235

Open
OlegMazurov opened this issue Mar 7, 2025 · 2 comments · May be fixed by #18250
Open

LongListDisk.writeLongsData() may corrupt data in snapshots #18235

OlegMazurov opened this issue Mar 7, 2025 · 2 comments · May be fixed by #18250
Assignees
Labels
Bug An error that causes the feature to behave differently than what was expected based on design. Platform Data Structures Platform Virtual Map Platform Tickets pertaining to the platform
Milestone

Comments

@OlegMazurov
Copy link
Contributor

OlegMazurov commented Mar 7, 2025

LongListDisk.writeLongsData() reads all current chunks from its backing file and writes them to the target file in logical order (from min to max index value).
Chunk file offsets are allocate dynamically and the last logical chunk (with largest index values) may be allocated in the middle of the backing file. Similarly, the last physical chunk may be in the middle of the index value range.
The last physical chunk may be truncated because the backing file is enlarged according to the actual data written to it (see LongListDisk.putToChunk()) and if the largest index value in that chunk is less than memoryChunkSize the chunk is effectively truncated: when such chunk is read into transferBuffer the number of bytes actually read is less than the limit of the buffer and the remaining part of the buffer is left unchanged, possibly containing data from the previous logical chunk. The entire buffer with the wrong data is then written into the target file (that was actually observed in #18136).

@OlegMazurov OlegMazurov added this to the v0.61 milestone Mar 7, 2025
@artemananiev artemananiev self-assigned this Mar 7, 2025
@artemananiev artemananiev added Bug An error that causes the feature to behave differently than what was expected based on design. Platform Virtual Map Platform Data Structures Platform Tickets pertaining to the platform labels Mar 7, 2025
@artemananiev
Copy link
Contributor

I don't think two chunks mentioned in the description are affected, but only one of them, which is the last chunk in the LongListDisk backing file (it may or may not be the chunk with the highest index). All chunks in the middle of the backing file are written in full regardless of their indexes.

The fix can be to explicitly append a full chunk (filled with zeroes) to the end of the backing file, when a new chunk is created.

@OlegMazurov
Copy link
Contributor Author

I don't think two chunks mentioned in the description are affected...

I agree. I have updated the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug An error that causes the feature to behave differently than what was expected based on design. Platform Data Structures Platform Virtual Map Platform Tickets pertaining to the platform
Projects
Status: 👀 In Review
2 participants