Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYS-6179] Add MultiReadAsync file API. #290

Merged
merged 4 commits into from
Feb 14, 2024
Merged

Conversation

dpetrov4
Copy link

Extending the file API to provide an option to implement asynchronous
multi read operation. The default implementation delegates to AsyncRead.
Updated the AsyncFileReader class implementation to use MultiReadAsync.

@dpetrov4 dpetrov4 force-pushed the dmitri_multi_async branch 2 times, most recently from 86c1e00 to 3e0c7de Compare October 13, 2023 21:07
@dpetrov4 dpetrov4 marked this pull request as ready for review October 18, 2023 19:24
@dpetrov4 dpetrov4 requested review from nbronson, igorcanadi and a team October 18, 2023 19:25
Copy link
Collaborator

@igorcanadi igorcanadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm accepting this PR, but at the same time want to discuss an alternative. This PR reaches fairly deep into RocksDB and will introduce merge conflicts as we try to keep up with upstream. What do you think about leaving RocksDB as is and introducing request batching at our layer?

I'm thinking ReadAsync implementation on our end could wait for a very short period of time, and then batch up any requests that come along in that wait period. That way we might be able to even batch requests from multiple threads if we get lucky. What do you think?

@dpetrov4
Copy link
Author

@igorcanadi

I am not a fan of splitting multi part entities and then trying to reassemble them later, this always seem to cause some kind of tricky issue on top of being inefficient. However, this is just my personal bias -- TAO did something similar in recent years with DB missies and was quite successful at it.
The other side of the story is that I suspect that FB people working on RocksDB will see "the light" and will probably add their version of AsyncMultiGet. I suspect that didn't do it from the beginning because error handling gets tricky.

Anyways, for my taste I would go with what we have here and suffer the merge conflicts if they show up, however I don't have any experience of actually suffering through them, so LMK if you feel I am misguided here.

@igorcanadi
Copy link
Collaborator

@igorcanadi

I am not a fan of splitting multi part entities and then trying to reassemble them later, this always seem to cause some kind of tricky issue on top of being inefficient. However, this is just my personal bias -- TAO did something similar in recent years with DB missies and was quite successful at it. The other side of the story is that I suspect that FB people working on RocksDB will see "the light" and will probably add their version of AsyncMultiGet. I suspect that didn't do it from the beginning because error handling gets tricky.

Anyways, for my taste I would go with what we have here and suffer the merge conflicts if they show up, however I don't have any experience of actually suffering through them, so LMK if you feel I am misguided here.

Sounds good, let's ship this.

@dpetrov4 dpetrov4 force-pushed the dmitri_multi_async branch 6 times, most recently from 1b9a4fa to 6792b57 Compare December 20, 2023 14:50
@dpetrov4 dpetrov4 force-pushed the dmitri_multi_async branch 4 times, most recently from 1d6d20f to 6700da7 Compare December 22, 2023 17:28
@dpetrov4 dpetrov4 force-pushed the dmitri_multi_async branch 4 times, most recently from f9284f3 to a1b8d6e Compare January 21, 2024 19:00
@@ -5683,7 +5683,7 @@ Status VersionSet::LogAndApplyHelper(ColumnFamilyData* cfd,
}

if (edit->HasReplicationSequence()) {
replication_sequence_ = edit->GetReplicationSequence();
replication_sequence_ = edit->GetReplicationSequence();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might conflict with @seckcoder 's changes, and since it's only a formatting change, can we revert it pls?

std::function<void(const FSReadRequest*, size_t, void*)> cb_;
void* cb_arg_;
uint64_t start_time_;
#ifndef ROCKSDB_LITE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCKSDB_LITE is removed in 8.0, which I plan to upgrade to very soon, so let's remove the guard here.

} else if (!req.status.IsAborted()) {
RecordTick(stats_, ASYNC_READ_ERROR_COUNT, 1);
}
#ifndef ROCKSDB_LITE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Comment on lines +534 to +553
// Returns CloudManifest file name for a given db.
virtual std::string CloudManifestFile(const std::string& dbname) = 0;

virtual CloudManifest* GetCloudManifest() = 0;

// TODO(wei): this function is used to temporarily support open db and switch
// cookie. Change it to use the cookie from options once migration complete.
virtual IOStatus CreateCloudManifest(const std::string& local_dbname,
const std::string& cookie) = 0;

virtual IOStatus LoadCloudManifest(const std::string& local_dbname,
bool read_only) = 0;

virtual IOStatus UploadCloudManifest(const std::string& local_dbname,
const std::string& cookie) const = 0;

// Prepare a local directory for use as a clone of the cloud storage
virtual IOStatus SanitizeLocalDirectory(const DBOptions& options,
const std::string& local_name,
bool read_only) = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes unrelated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to cleaning up the APIs so that we can implement DisaggFileSystem in Rockset.

Comment on lines 676 to 684
virtual IOStatus Poll(std::vector<void*>& io_handles,
size_t /*min_completions*/) {
// RocksDB-Cloud contribution begin
if (!io_handles.empty()) {
// Default implementations on async operations do not return any
// io_handles. If you override them, you need to override Poll as well.
return IOStatus::NotSupported("Poll");
}
// RocksDB-Cloud contribution end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for added safety? I prefer less safety if that means less merge conflicts in the future :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okayyy.

//
// When the read request is completed, callback function specified in cb
// should be called with arguments cb_arg and the result populated in
// FSReadRequest with result and status fileds updated by FileSystem.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fileds/fields

Extending the file API to provide an option to implement asynchronous
multi read operation. The default implementation delegates to AsyncRead.
Updated the AsyncFileReader class implementation to use MultiReadAsync.
Expanidng the CloudFileSystem API to include methods
for handling CloudManifest. This allows us to better hide
CloudFileSystemImpl and allow different implementations of
CloudFileSystem.
@dpetrov4 dpetrov4 merged commit 61223e2 into master Feb 14, 2024
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants