kvserver: introduce blocking vfs.FS; add some tests #98852

tbg · 2023-03-17T12:15:30Z

edit: with the third commit in place, the test in 2) now passes.

This introduces a vfs.FS implementation that can be toggled to block
on any write activity.

Add two tests:

check that even though a durable commit is inflight, we can still
write non-durably. I.e. at the pebble level there is no immediate
reason why we couldn't keep applying the raft log even though log
appends are stuck.
end-to-end test where we set up a three node cluster, send unique
writes and stall the leaseholder's disk, hoping for the writes to
keep going through. This fails, as is expected, because raft does
not even put up entries for application if they haven't been durably
appended to the leader's log (which they can't be due to the stall
we're injecting).

If we can get 2) to pass, we can have reason to believe that we can
address #88699 (though we'd need to verify with "real" disk stalls
and higher write volumes/stall durations and would certainly need to
configure pebble to allow much more data to accumulate in memory).

The additional benefit test 2) provides is that it prints who is
blocked. At the time of writing, it's encouraging - we see just
the LogWriter blocked on a call to Write (via flushPending).

The next step would be to hack in the ability to apply entries
that aren't durable on the leader and to see what we get.

Touches #88699.

Release note: None

This introduces a `vfs.FS` implementation that can be toggled to block on any write activity. Add two tests: 1. check that even though a durable commit is inflight, we can still write non-durably. I.e. at the pebble level there is no immediate reason why we couldn't keep *applying* the raft log even though log *appends* are stuck. 2. end-to-end test where we set up a three node cluster, send unique writes and stall the leaseholder's disk, hoping for the writes to keep going through. This fails, as is expected, because raft does not even put up entries for application if they haven't been durably appended to the leader's log (which they can't be due to the stall we're injecting). If we can get 2) to pass, we can have reason to believe that we can address cockroachdb#88699 (though we'd need to verify with "real" disk stalls and higher write volumes/stall durations and would certainly need to configure pebble to allow much more data to accumulate in memory). The additional benefit test 2) provides is that it prints who is blocked. At the time of writing, it's encouraging - we see just the LogWriter blocked on a call to `Write` (via `flushPending`).o The next step would be to hack in the ability to apply entries that aren't durable on the leader and to see what we get. Touches cockroachdb#88699. Release note: None

cockroach-teamcity · 2023-03-17T12:15:43Z

This change is

tbg · 2023-03-17T12:16:06Z

cc @jbowens @nvanbenschoten @erikgrinaker since we talked about #88699.

(Internal) Slack thread

tbg · 2023-03-17T17:04:45Z

Going to try to hook this up with etcd-io/raft#37

Surprised me, but TestWriteWithStuckLeaseholder now passes. Epic: none Release note: None

tbg · 2023-03-28T07:26:31Z

From the above Slack thread:

I got raft patched to also put up entries for application that are committed via a non-local quorum, and magically things work.. until the ~257th write, then it stalls:

Jackson then had some responses on where the stalls would be.

At this point I'm going to disengage from the prototype, we can pick this back up if we're interested in making progress in earnest. There seems to be reason to believe that with some pebble work we can possibly mask short disk stalls or latency degradations without necessarily exposing them to users. What's unexplored is how "real" disk stalls affect disk reads. We need to be doing all pebble reads out of the cache during evaluation and replication or requests get stuck anyway.

tbg · 2023-03-28T07:27:55Z

Closing: the work is tracked in #88699.

@jbowens any interest in putting the blocking vfs.VFS anywhere? I'm not sure how disk stalls are being tested right now, I epxected something similar to exist for that use case but it doesn't seem to.

jbowens · 2023-03-28T14:26:59Z

Yeah, maybe we should start a pebble/vfs/vfstest package that can be consumed within both Pebble and Cockroach. Our current disk stall unit tests use a completely mocked fs: vfs/disk_health_test.go

IIRC, there's also a few tests that use errorfs and a custom InjectorFunc to induce blocking. We could also move errorfs out of the internal package so that it can be used by cockroach.

tbg added 2 commits March 18, 2023 20:37

custom raft branch

266b492

apply remotely committed entries

8daad91

Surprised me, but TestWriteWithStuckLeaseholder now passes. Epic: none Release note: None

This was referenced Mar 18, 2023

kvserver: Replica.Send write path ctx cancellation doesn't work when disk stalled #98957

Open

kvserver: allow committing entries not in leader's stable storage #88699

Open

wip

4ee2345

tbg closed this Mar 28, 2023

jbowens mentioned this pull request Apr 3, 2023

vfs: add blockingFS test fs cockroachdb/pebble#2431

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: introduce blocking vfs.FS; add some tests #98852

kvserver: introduce blocking vfs.FS; add some tests #98852

tbg commented Mar 17, 2023 •

edited

Loading

cockroach-teamcity commented Mar 17, 2023

tbg commented Mar 17, 2023

tbg commented Mar 17, 2023

tbg commented Mar 28, 2023

tbg commented Mar 28, 2023

jbowens commented Mar 28, 2023

kvserver: introduce blocking vfs.FS; add some tests #98852

kvserver: introduce blocking vfs.FS; add some tests #98852

Conversation

tbg commented Mar 17, 2023 • edited Loading

cockroach-teamcity commented Mar 17, 2023

tbg commented Mar 17, 2023

tbg commented Mar 17, 2023

tbg commented Mar 28, 2023

tbg commented Mar 28, 2023

jbowens commented Mar 28, 2023

tbg commented Mar 17, 2023 •

edited

Loading