Fix WBM concurrency control, Add SetAllowStall(), Cleanup #11253

hx235 · 2023-02-25T08:14:02Z

Context:
Allow changing WBM::allow_stall_ in runtime gives flexibility to users. In order to do that, we need to close a few gaps in WBM's concurrency control so this parameter change could happen concurrently.

Summary:

Add synchronization to ALL WBM's data members as they are all part of the following decisions. Previously the dynamic-changeable buffer_size_ from other thread was not protected by such synchronization. In particular, use one instead of a few internal locks to do so.

Write => Optionally charge memory to cache => check stall condition (allow_stall, enable(), buffer_size vs memory usage)  => if stalled, create stall and wait & ShouldFlush() returns true => flush will decrease memory usage (Optionally release memory from cache) => stall condition no longer meets =>  notify waiting write to stop stalling

Add new function SetAllowStall()
Misc:
- A few cleanups in the logic - see PR conversation.
- Clarify class/function comments about concurrency; Made some public functions but should only called by RocksDB internal private

Test:

New UT
Performance test

./db_bench -seed=1679014417652004 -db=/dev/shm/testdb/ -statistics=false -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=100000 -db_write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3

pre-change: fillseq [AVG 38 runs] : 960 (± 276) ops/sec; 0.5 (± 0.1) MB/sec
post-change (no regression but a slight improvement): fillseq [AVG 38 runs] : 997 (± 298) ops/sec; 0.5 (± 0.2) MB/sec

memtable/write_buffer_manager.cc

db/db_impl/db_impl_write.cc

hx235 · 2023-03-17T01:59:15Z

include/rocksdb/write_buffer_manager.h


+ private:


Cleanup1: make functions that are marked as "should be called within RocksDB internal" private

hx235 · 2023-03-17T02:02:29Z

include/rocksdb/write_buffer_manager.h

-  // Value should only be changed by BeginWriteStall() and MaybeEndWriteStall()
-  // while holding mu_, but it can be read without a lock.
-  std::atomic<bool> stall_active_;


Cleanup2: I don't see how stall_active_ and its related functions are needed. It seems like as long as we have our stall condition check (ShouldStall()) implemented right, then we will always have stall_active_ == ShouldStall(). But let me know if I overlook anything :)

@akankshamahajan15 shared an perspective of stall_active_ might exists as a perf optimization to reduce lock contention for the case of multiple DB using same WBM. She will cite more previous discussion on this soon.

I am not entirely sure about keeping this yet mainly for the reason that such perf optimization makes the concurrency model of WBM harder to understand as some can be access without lock while some can't.

[TODO for me] Understand the previous conversation on having stall_active_ the need of that; reconsider again the perf cost VS model simplicity

hx235 · 2023-03-17T02:05:07Z

memtable/write_buffer_manager.cc

+  if (new_size == 0) {
+    assert(false);
+    return;
+  }


Cleanup3: based on a comment in the code Cannot early-exit on !enabled() because SetBufferSize(0) needs to unblock..., I believe we don't want buffer size to be set to 0 in runtime. So I added an explicit check here for both prod and debug build.

hx235 · 2023-03-17T02:16:00Z

memtable/write_buffer_manager.cc

-  // If the node was not consumed, the stall has ended already and we can signal
-  // the caller.
-  if (!new_node.empty()) {
-    new_node.front()->Signal();


Cleanup4: I don't see why we need to signal wbm_stall here since if wbm_stall has been blocked due to another writer thread of the same DB has set it to blocked, that writer thread would be responsible for unclocking/signaling this wbm_stall, not the current one. But again let me know if I miss anything :)

hx235 · 2023-03-17T02:17:49Z

memtable/write_buffer_manager.cc

-  // Cannot early-exit on !enabled() because SetBufferSize(0) needs to unblock
-  // the writers.


Question 2: I am not quite clear about this comment.

Does it means we don't need to early-exit on !enabled() because it won't happen as "SetBufferSize(0) needs to unblock the writers"?

Or does it mean we are not able to do "early-exit" because we don't meet the requirement which "SetBufferSize(0) needs to unblock the writers"?

hx235 · 2023-03-17T02:19:16Z

memtable/write_buffer_manager.cc

-  // Cannot early-exit on !enabled() because SetBufferSize(0) needs to unblock
-  // the writers.
-  if (!allow_stall_) {
+  if (ShouldStall()) {


Cleanup5: I decided to encapsulate stall condition check inside ShouldStall() and use this function as much as possible whenever we need to check if the stall condition has been met or not. This makes it easier for us to change stall condition check in the future e.g, making some of it runtime-changeable, adding new stall condition (cc @ajkr: you know what I'm talking about :p :p :p - the stall on global memory full)

hx235 · 2023-03-17T02:22:01Z

memtable/write_buffer_manager.cc

-    return;  // Stall conditions have not resolved.
-  }
-
-  // Perform all deallocations outside of the lock.


Note: we can't do this now as we need to hold the lock for calling MaybeEndWriteStall(). See perf test result for any perf regression concern.

hx235 · 2023-03-17T02:25:50Z

memtable/write_buffer_manager.cc

@@ -173,9 +179,8 @@ void WriteBufferManager::RemoveDBFromQueue(StallInterface* wbm_stall) {

  // Deallocate the removed nodes outside of the lock.
  std::list<StallInterface*> cleanup;
-
-  if (enabled() && allow_stall_) {


Optimization 2: it appears to me that we shouldn't check enabled() && allow_stall_ again with this PR and don't need to check it before this PR. Reasons:

allow_stall_ can be changed to false in runtime and we don't want it to interfere with RemoveDBFromQueue() as a DB cleanup

Even when allow_stall_ was not run-time changeable before, the queue_ wouldn't have anything if enabled() && allow_stall_ was false cuz we won't have the stall on then.

hx235 · 2023-03-17T02:27:43Z

memtable/write_buffer_manager.cc

  assert(wbm_stall != nullptr);
-  assert(allow_stall_);
-
-  // Allocate outside of the lock.


Question 3: just to clarify, this "allocate" means allocate of the list of pointers new_node not the actual StallInterface object wbm_stall right? Same for all the de-allocation done outside the lock below like // Deallocate the removed nodes outside of the lock.

facebook-github-bot · 2023-03-17T06:18:33Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

HISTORY.md

hx235 · 2023-03-21T21:47:11Z

include/rocksdb/write_buffer_manager.h

+  void FreeMemWithCache(size_t mem);
+
+  // Mutex used to protect WriteBufferManager's data variables.
+  mutable std::mutex wbm_mutex_;


Cleanup 6: use 1 instead of multiple mutex to guard separate groups of WBM data for simplicity.

facebook-github-bot · 2023-03-21T23:54:05Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

hx235 · 2023-03-21T23:59:37Z

db/db_impl/db_impl.h

-  void WriteBufferManagerStallWrites();
+  // If stall conditions are met, begin stalling of writes with help of
+  // `WriteBufferManager`
+  void MaybeWriteBufferManagerStallWrites();


Cleanup 7: renaming this function to reflect that we will still do stall condition check again within this function.

hx235 · 2023-03-22T00:00:41Z

db/db_impl/db_impl_write.cc

-  static_cast<WBMStallInterface*>(wbm_stall_.get())
-      ->SetState(WBMStallInterface::State::BLOCKED);


Cleanup 8: make this part of the responsibility of MaybeBeginWriteStall() as the casting static_cast<WBMStallInterface*> indicates we are dealing something too low-level here

…ore stopping adding future write threads to queue

facebook-github-bot · 2023-03-22T00:13:39Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-03-22T00:15:06Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

akankshamahajan15 · 2023-03-28T18:47:42Z

memtable/write_buffer_manager.cc

  if (enabled()) {
    memory_active_.fetch_sub(mem, std::memory_order_relaxed);
  }
 }

 void WriteBufferManager::FreeMem(size_t mem) {
+  std::unique_lock<std::mutex> lock(wbm_mutex_);


Can we acquire the lock inside these functions FreeMemWithCache, MaybeEndWriteStall etc to reduce the lock duration?

Is there a concrete concern where and how such lock duration can be problematic? As far as I know, the longest duration come from cache_res_mgr_->UpdateCacheReservation() evicting dummy entries linear to mem to be freed from cache. Would that be long enough to be problem?

My intention is to make FreeMem an atomic function since read/write to memory_used_ and MaybeEndWriteStall() are very related through ShouldStall(). Making these two into one atomic option relieves us from thinking about several data race.

We also have a plan of considering the returned status s of cache_res_mgr_->UpdateCacheReservation() in WriteBufferManager::FreeMemWithCache() (or ReserveMemWithCache()) into the decision of we should stall or not. Now we only do s.PermitUncheckedError(); So from this perspective, I'd like to make FreeMemWithCache atomic with read/write to memory_used_ and MaybeEndWriteStall().

akankshamahajan15 · 2023-03-28T18:56:33Z

memtable/write_buffer_manager.cc

@@ -84,26 +117,25 @@ void WriteBufferManager::ReserveMemWithCache(size_t mem) {
 }

 void WriteBufferManager::ScheduleFreeMem(size_t mem) {
+  std::unique_lock<std::mutex> lock(wbm_mutex_);


IIRC, we don't use lock for memory_active_ and other similar variables to reduce the locking/releasing the mutex which can be expensive and these variables are already atomic so they don't really need the mutex.

Since we are reading the buffer_size_'s value with order std::memory_order_relaxed, if without locking, do we concern about not getting the latest modified value by other thread to buffer_size_ through SetBufferSize() because of std::memory_order_relaxed?

This is another reason in addition to "making related operations atomic" for simplicity as mentioned in https://github.com/facebook/rocksdb/pull/11253/files#r1151045690.

This reminds me of we should sync up on whether we care about these two things as the answer to these questions can impact this PR e.g, remove some lock added

hx235 · 2023-03-28T20:58:50Z

Update: based on the discussion @akankshamahajan15, most of the "Fix WBM concurrency control" will probably add overhead to each write as it requires many threads (write threads X number db sharing same WBM) to compete for a lock. Therefore, I am planning to only add SetAllowStal() part to the PR. Since this PR is full of conversation focused on WBM concurrency control, I will close this one (for historical record) and open another one.

facebook-github-bot added the CLA Signed label Feb 25, 2023

hx235 commented Feb 25, 2023

View reviewed changes

memtable/write_buffer_manager.cc Outdated Show resolved Hide resolved

hx235 changed the title ~~Draft~~ [ForDiscussion] Improve WBM concurrency control + add SetAllowStall + cleanup/refactory Feb 25, 2023

hx235 force-pushed the runtime-changeable-allow-stall branch from cae3e96 to 8bbb04a Compare February 27, 2023 19:57

hx235 force-pushed the runtime-changeable-allow-stall branch from 8bbb04a to ed61838 Compare March 16, 2023 19:41

hx235 changed the title ~~[ForDiscussion] Improve WBM concurrency control + add SetAllowStall + cleanup/refactory~~ Improve WBM concurrency control + add SetAllowStall + cleanup/refactory Mar 16, 2023

hx235 force-pushed the runtime-changeable-allow-stall branch from ed61838 to 9eeeb70 Compare March 16, 2023 23:12

hx235 changed the title ~~Improve WBM concurrency control + add SetAllowStall + cleanup/refactory~~ Improve WBM concurrency control, add SetAllowStall(), cleanup/refractory Mar 16, 2023

hx235 changed the title ~~Improve WBM concurrency control, add SetAllowStall(), cleanup/refractory~~ Improve WBM concurrency control and add SetAllowStall() Mar 16, 2023

hx235 changed the title ~~Improve WBM concurrency control and add SetAllowStall()~~ Fix WBM concurrency control, Add SetAllowStall() Mar 16, 2023

hx235 force-pushed the runtime-changeable-allow-stall branch from 9eeeb70 to 5ee9591 Compare March 17, 2023 00:08

hx235 commented Mar 17, 2023

View reviewed changes

db/db_impl/db_impl_write.cc Outdated Show resolved Hide resolved

hx235 commented Mar 17, 2023

View reviewed changes

db/db_impl/db_impl_write.cc Outdated Show resolved Hide resolved

hx235 commented Mar 17, 2023

View reviewed changes

hx235 changed the title ~~Fix WBM concurrency control, Add SetAllowStall()~~ Fix WBM concurrency control, Add SetAllowStall(), cleanup Mar 17, 2023

hx235 commented Mar 17, 2023

View reviewed changes

hx235 force-pushed the runtime-changeable-allow-stall branch 2 times, most recently from 342301b to 4cc5b83 Compare March 17, 2023 02:45

hx235 changed the title ~~Fix WBM concurrency control, Add SetAllowStall(), cleanup~~ Fix WBM concurrency control, Add SetAllowStall(), Cleanup Mar 17, 2023

hx235 requested a review from akankshamahajan15 March 17, 2023 04:50

hx235 force-pushed the runtime-changeable-allow-stall branch from 4cc5b83 to dfedf99 Compare March 17, 2023 06:15

hx235 requested a review from ajkr March 17, 2023 06:27

hx235 commented Mar 17, 2023

View reviewed changes

HISTORY.md Outdated Show resolved Hide resolved

hx235 commented Mar 21, 2023

View reviewed changes

hx235 commented Mar 22, 2023

View reviewed changes

hx235 added 3 commits March 21, 2023 17:13

Refactor/improve WBM's concurrency model + SetAllowStall()

c1f4ad2

revert the unnecessary optimization of doing WBM::BeginWriteStall bef…

6424ea1

…ore stopping adding future write threads to queue

Minor fix to API comments

fbfd947

hx235 force-pushed the runtime-changeable-allow-stall branch from ad47215 to fbfd947 Compare March 22, 2023 00:13

akankshamahajan15 reviewed Mar 28, 2023

View reviewed changes

hx235 closed this Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix WBM concurrency control, Add SetAllowStall(), Cleanup #11253

Fix WBM concurrency control, Add SetAllowStall(), Cleanup #11253

hx235 commented Feb 25, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 21, 2023

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading

facebook-github-bot commented Mar 17, 2023

hx235 Mar 21, 2023

facebook-github-bot commented Mar 21, 2023

hx235 Mar 21, 2023

hx235 Mar 22, 2023

facebook-github-bot commented Mar 22, 2023

facebook-github-bot commented Mar 22, 2023

akankshamahajan15 Mar 28, 2023

hx235 Mar 28, 2023

akankshamahajan15 Mar 28, 2023

hx235 Mar 28, 2023

hx235 commented Mar 28, 2023

		// Cannot early-exit on !enabled() because SetBufferSize(0) needs to unblock
		// the writers.

		static_cast<WBMStallInterface*>(wbm_stall_.get())
		->SetState(WBMStallInterface::State::BLOCKED);

Fix WBM concurrency control, Add SetAllowStall(), Cleanup #11253

Fix WBM concurrency control, Add SetAllowStall(), Cleanup #11253

Conversation

hx235 commented Feb 25, 2023 • edited Loading

hx235 Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

hx235 Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hx235 Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

hx235 Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hx235 Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hx235 Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

hx235 Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

facebook-github-bot commented Mar 17, 2023

Choose a reason for hiding this comment

facebook-github-bot commented Mar 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Mar 22, 2023

facebook-github-bot commented Mar 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hx235 commented Mar 28, 2023

hx235 commented Feb 25, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading

hx235 Mar 17, 2023 •

edited

Loading