-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Handle nowait support for reads and writes independently #2987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The value is going to become more than yes/no switch and this patch prepares for that. For now only two values -- yes and no. Signed-off-by: Pavel Emelyanov <[email protected]>
Continuation of the previous patch -- now the filesystem is also prepared to export is "nowait" facility as enum class. Signed-off-by: Pavel Emelyanov <[email protected]>
This is the third state, which allows RWF_NOWAIT flag for read requests only. Signed-off-by: Pavel Emelyanov <[email protected]>
In the end of the day the nowait mode to use depends on three sources of information: the filesystem itself, the kernel version and the reactor option (CLI switch). By now the selection was between yes and no modes, this patch wires the read_only mode detection and activation. Signed-off-by: Pavel Emelyanov <[email protected]>
For the record: |
@avikivity , please review |
1 similar comment
@avikivity , please review |
What's the goal? Why would one enable nowait for one and not the other? Is it because we think nowait is misreported for writes but not reads? |
It's not that someone enables or disables it, this "switch" is not controlled by any new option. The old option to enable/disable RWF_NOWAIT altogether doesn't change its behavior. The change here is only about automatic selection of whether or not seastar will use this bit on iocbs. And the suggestion is -- reads can be submitted with nowait and writes not. Or is your question -- why should we submit writes without nowait at all? |
Yes, why the different behavior for reads and writes? |
Because nowait writes have many chances to instantly EAGAIN and it's greatly out of our control. Major issue is cmtime update, someone (us?) have to fix the kernel. Two options Both are turning to be rather long journey, then wait for the fixes to reach cloud instances. And seastar needs to live with it somehow Other than cmtime update, XFS files are still append-challenged, truncating it ahead of writes doesn't help at all, extent-allocation-size-hint helps partially. But that's, probably, minor |
Running a "restore from backup" test with #2975 and one more metrics. ![]() I also added a metrics to show the number of context switches [1]. For now it looks like this ![]() but no comments about it yet. My plan is to re-run the very same test with explicit ban of RWF_NOWAIT for writes and check if context switches grow up. [1] --- a/src/core/reactor.cc
+++ b/src/core/reactor.cc
@@ -2543,6 +2543,11 @@ void reactor::register_metrics() {
io_fallback_counter("file_operation", internal::thread_pool_submit_reason::file_operation),
// total_operations value:DERIVE:0:U
io_fallback_counter("process_operation", internal::thread_pool_submit_reason::process_operation),
+ sm::make_counter("ru_nvcsw", sm::description("Number of voluntary context switches"), [] {
+ struct ::rusage ru;
+ ::getrusage(RUSAGE_THREAD, &ru);
+ return ru.ru_nvcsw;
+ })
});
_metric_groups.add_group("memory", { |
Yes, but how does it help to disable RWF_NOWAIT? Instead of a retry, you'll stall the reactor. I can understand this logic: the write EAGAINs are bogus and the request without RWF_NOWAIT will succeed without stalling. But is this the case? |
I'm collecting the data to see if we really do. I agree that we trade guaranteed retry for potential stall though.
Almost. Request without RWF_NOWAIT will likely succeed without stalling, that's my point, and once I have some stats, I'll post it here. And also even when cmtime update is fixed, we still need to live somehow until the fix propagates to cloud instances. Maybe the way to go in such cases is to route all writes into thread-pool retry right at once, without waiting for EAGAIN from the kernel? |
Whether or not to set RWF_NOWAIT flag for iocb is controlled by the nowait_works bit that travels a long journey from reactor option via fs_info, then file, then struct request (or more exactly -- one of its united sub-structures). Nowadays this bit is boolean, but we've seen that it might be ultimately broken for writes and work for reads. This PR turns the nowait_works bit on file and fs_info to be a tri-state enum. The request struct still carries it as bit (because they are independent for reads and writes) as well as reactor option remains boolean.
"By default" (i.e. -- on modern kernels with no --linux-aio-nowait option) the mode activates to read_only for XFS. As a result, reads are submitted with the flag and writes are submitted without it. With that we trade guaranteed write-retry for potential delay in the kernel submitting it. Hopefully, chances for the latter are low enough.
refs: #2974