-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileIndex + New File Structure + File Prefetch + Parallel Write #8
Conversation
… number of entries in directory as if we've written files at level0
}) | ||
func (r *reader[T]) prefetchNextFile(ctx context.Context) { | ||
if r.currFileIndex+1 < len(r.fileIndex.Files()) { | ||
go r.prefetchFile(ctx, r.fileIndex.At(r.currFileIndex+1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there limits for the number of concurrent prefetchFile
tasks spawned simultaneously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's always the next file. So it's kind of limited by that. So no more than 1 prefetch can be spawned.
return r.readFile(index) | ||
} | ||
} | ||
_ = file.Prefetch(pCtx, r.fs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error here is neither logged nor handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fire and forget background task. If it fails the file will be opened as usual instead of using prefetched buffer. No need for logging or handling.
Please have a look at File.Open:
Lines 167 to 176 in 57ad4b3
func (f *File) Open(ctx context.Context, fs storage.FS) (io.ReadCloser, error) { | |
f.mu.Lock() | |
prefetchedRdr := f.prefetched() | |
f.mu.Unlock() | |
if prefetchedRdr != nil { | |
return prefetchedRdr, nil | |
} | |
return f.open(ctx, fs) | |
} |
Looking great overall, I added a few comments |
Thanks, I will address tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments! LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice and clean code. 🎉
Few questions / minor feedback :)
// -- ethwal | ||
// |-- 000563 | ||
// | |-- 000256 | ||
// | | |-- 000124 | ||
// | | | |-- 28f55a4df523b3ef28f55a4df523b3ef28f55a4df523b3ef28f55a4df523b3ef <- ethwal file | ||
// | | |-- 000278 | ||
// | | | |-- 28f55a4df523b3ef28f55a4df523b3ef28f55a4df523b3ef28f55a4df523b3dd <- ethwal file | ||
// | |-- 000025 | ||
// | | |-- 000967 | ||
// | | | |-- 28f55a4df523b3ef28f55a4df523b3ef28f55a4df523b3dd28f55a4df523b3ef <- ethwal file | ||
// |-- .fileIndex | ||
// | ||
// The data structure ensures that there is no more than 1000 directories per level. The filename is a sha-256 hash of | ||
// the first and last block numbers. The hash is used to distribute files evenly across directories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the main reason for creating this 3-level folder structure?
"no more than 1000 directories per level" -- is this some limitation of GCP buckets? or is it so we don't need to list dirs with paging? anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local filesystem start to have a issues with listing files ~20k files. It takes a while. The problem becomes even more visible when we are trying to list Google Storage bucket contents. It took a couple of minutes with aprox. 40k files.
In additon to that MacOS have been hanging when I have been working in proximity of ethwal directory.
Peter suggested to put files in directories with each of them having limited number of files that can be worked with easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks for the insights :)
FileIndex + New File Structure + File Prefetch + Parallel Write
No description provided.