stmtstats: investigate mutex contention in ssmemstorage.(*Container).getStatsForStmtWithKey #140590

tbg · 2025-02-06T13:44:03Z

See #139557 for BenchmarkParallelSysbench, whose mutex profile prompted creation of this issue.

The mutex profile there is potentially interesting because it shows a very prominent contended mutex in ssmemstorage.(*Container).getStatsForStmtWithKey. This accounts for >50% of contention (329.36s), on a benchmark invocation that ran for ~122s. I think this could be interpreted as losing ~2.7 (=329/122) worker threads (out of 24) to contention. Perhaps one could say that throughput could be higher by a factor of 24/21.3 ≈ 1.13x. That's 13%, so it would be pretty massive. I would take this with a large grain of salt though, since I didn't see this contention as prominently in mutex profiles from a real cluster¹, and instead it may be an artifact of having too many worker goroutines (we have 24 in this benchmark, but throughput is only ~10x higher than when running with a single worker). On the other hand, the mutex being hot might highlight that this is one reason linear scaling breaks down at that point.
We should take a look.

Mutex

Epic: CRDB-42584

Jira issue: CRDB-47252

https://github.com/cockroachdb/cockroach/issues/140235 ↩

The text was updated successfully, but these errors were encountered:

tbg · 2025-02-06T14:23:58Z

A quick experiment with this very hacky diff on top of #139557

diff

commit b99b2d6e9e34e67044c252c566c5e5354f7fb50d
Author: Tobias Grieger <[email protected]>
Date:   Thu Feb 6 15:02:48 2025 +0100

    [dnm] use RLock for getStatsFor{Stmt,Txn}WithKey

diff --git a/pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go b/pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
index a0d4455e138..464d11fced9 100644
--- a/pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
+++ b/pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
@@ -477,8 +477,8 @@ func (s *Container) getStatsForStmt(
 func (s *Container) getStatsForStmtWithKey(
 	key stmtKey, stmtFingerprintID appstatspb.StmtFingerprintID, createIfNonexistent bool,
 ) (stats *stmtStats, created, throttled bool) {
-	s.mu.Lock()
-	defer s.mu.Unlock()
+	s.mu.RLock()
+	defer s.mu.RUnlock()
 	return s.getStatsForStmtWithKeyLocked(key, stmtFingerprintID, createIfNonexistent)
 }
 
@@ -496,8 +496,13 @@ func (s *Container) getStatsForStmtWithKeyLocked(
 		}
 		stats = &stmtStats{}
 		stats.ID = stmtFingerprintID
+		// Bad hack alert!
+		s.mu.RUnlock()
+		s.mu.Lock()
 		s.mu.stmts[key] = stats
 		s.mu.sampledStatementCache[key.sampledPlanKey] = struct{}{}
+		s.mu.Unlock()
+		s.mu.RLock()
 
 		return stats, true /* created */, false /* throttled */
 	}
@@ -509,8 +514,8 @@ func (s *Container) getStatsForTxnWithKey(
 	stmtFingerprintIDs []appstatspb.StmtFingerprintID,
 	createIfNonexistent bool,
 ) (stats *txnStats, created, throttled bool) {
-	s.mu.Lock()
-	defer s.mu.Unlock()
+	s.mu.RLock()
+	defer s.mu.RUnlock()
 
 	return s.getStatsForTxnWithKeyLocked(key, stmtFingerprintIDs, createIfNonexistent)
 }
@@ -531,7 +536,12 @@ func (s *Container) getStatsForTxnWithKeyLocked(
 		}
 		stats = &txnStats{}
 		stats.statementFingerprintIDs = stmtFingerprintIDs
+		// Hack time.
+		s.mu.RUnlock()
+		s.mu.Lock()
 		s.mu.txns[key] = stats
+		s.mu.Unlock()
+		s.mu.RLock()
 		return stats, true /* created */, false /* throttled */
 	}
 	return stats, false /* created */, false /* throttled */

shows promising results:

old:  7fbc3d2 tests: extract BenchmarkParallelSysbench
new:  b99b2d6 [dnm] use RLock for getStatsFor{Stmt,Txn}WithKey
args: benchdiff "./pkg/sql/tests" "-b" "-r" "BenchmarkParallelSysbench/SQL/3node/oltp_read_write.*" "-d" "10000x" "-c" "20"

running benchmarks:
name                                           old time/op    new time/op    delta
ParallelSysbench/SQL/3node/oltp_read_write-24    1.12ms ± 1%    1.09ms ± 2%   -2.48%  (p=0.000 n=11+11)

name                                           old errs/op    new errs/op    delta
ParallelSysbench/SQL/3node/oltp_read_write-24      0.01 ±11%      0.01 ±17%  +27.94%  (p=0.000 n=12+12)

name                                           old alloc/op   new alloc/op   delta
ParallelSysbench/SQL/3node/oltp_read_write-24    2.05MB ± 0%    2.04MB ± 0%   -0.31%  (p=0.002 n=12+11)

name                                           old allocs/op  new allocs/op  delta
ParallelSysbench/SQL/3node/oltp_read_write-24     8.73k ± 1%     8.65k ± 1%   -0.85%  (p=0.000 n=12+12)

Also, predictably, the mutex disappears from the contention profile

mutex profile with hack

I think this kind of improvement would generally benefit all SQL gateway nodes on which certain statement fingerprints are very hot, which is common enough, and certainly is common in benchmarks.

I believe it's straightforward to change this code to use RLock by default, and to re-enter only when the map entry has to be created for the first time. That's a standard pattern, for example here:

cockroach/pkg/kv/kvserver/store_create_replica.go

Lines 161 to 185 in b99b2d6

    
           func (s *Store) tryGetOrCreateReplica( 
        
           	ctx context.Context, 
        
           	rangeID roachpb.RangeID, 
        
           	replicaID roachpb.ReplicaID, 
        
           	creatingReplica *roachpb.ReplicaDescriptor, 
        
           ) (_ *Replica, created bool, _ error) { 
        
           	// The common case: look up an existing replica. 
        
           	if repl, err := s.tryGetReplica(ctx, rangeID, replicaID, creatingReplica); err != nil { 
        
           		return nil, false, err 
        
           	} else if repl != nil { 
        
           		return repl, false, nil 
        
           	} 
        
           	// No replica currently exists, so try to create one. Multiple goroutines may 
        
           	// be racing at this point, so grab a "lock" over this rangeID (represented by 
        
           	// s.mu.creatingReplicas[rangeID]) by one goroutine, and retry others. 
        
           	s.mu.Lock() 
        
           	if _, ok := s.mu.creatingReplicas[rangeID]; ok { 
        
           		// Lost the race - another goroutine is currently creating that replica. Let 
        
           		// the caller retry so that they can eventually see it. 
        
           		s.mu.Unlock() 
        
           		return nil, false, errRetry 
        
           	} 
        
           	s.mu.creatingReplicas[rangeID] = struct{}{} 
        
           	s.mu.Unlock()

or here:

cockroach/pkg/rpc/context.go

Lines 2033 to 2053 in b99b2d6

    
           k := peerKey{TargetAddr: target, NodeID: remoteNodeID, Class: class} 
        
           if p, ok := rpcCtx.peers.get(k); ok { 
        
           	// There's a cached peer, so we have a cached connection, use it. 
        
           	return p.c 
        
           } 
        
           // Slow path. Race to create a peer. 
        
           conns := &rpcCtx.peers 
        
           conns.mu.Lock() 
        
           defer conns.mu.Unlock() 
        
           if p, lostRace := conns.getRLocked(k); lostRace { 
        
           	return p.c 
        
           } 
        
           // Won race. Actually create a peer. 
        
           if conns.mu.m == nil { 
        
           	conns.mu.m = map[peerKey]*peer{} 
        
           }

I haven't checked, but a sync.Map may also be an option here.

Either way, this seems like a small effort with a definite upside, so I would recommend taking this on.

xinhaoz · 2025-02-06T15:38:47Z

Thank you! This is being pursued in #140393. Extracting changes from that rn, including the lock improvements, but I would expect the biggest benefit will be from moving the recording out of the execution path which will take a bit longer for us to test thoroughly.

tbg added branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months T-observability labels Feb 6, 2025

tbg mentioned this issue Feb 6, 2025

tests: add parallel flavors of BenchmarkSysbench #139557

Open

xinhaoz self-assigned this Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stmtstats: investigate mutex contention in ssmemstorage.(*Container).getStatsForStmtWithKey #140590

stmtstats: investigate mutex contention in ssmemstorage.(*Container).getStatsForStmtWithKey #140590

tbg commented Feb 6, 2025 •

edited by cockroach-jira-scripts

Loading

tbg commented Feb 6, 2025 •

edited

Loading

xinhaoz commented Feb 6, 2025

stmtstats: investigate mutex contention in ssmemstorage.(*Container).getStatsForStmtWithKey #140590

stmtstats: investigate mutex contention in ssmemstorage.(*Container).getStatsForStmtWithKey #140590

Comments

tbg commented Feb 6, 2025 • edited by cockroach-jira-scripts Loading

Footnotes

tbg commented Feb 6, 2025 • edited Loading

xinhaoz commented Feb 6, 2025

tbg commented Feb 6, 2025 •

edited by cockroach-jira-scripts

Loading

tbg commented Feb 6, 2025 •

edited

Loading