Skip to content

Conversation

@manzhizhen
Copy link
Contributor

Ensure that rpcStatus.maxElapsed, rpcStatus.succeededMaxElapsed, and rpcStatus.failedMaxElapsed are correctly replaced through cas.

Fixes: #3006

…ed, and rpcStatus.failedMaxElapsed are correctly replaced through cas.

Fixes: #3006
Signed-off-by: yizhenqiang <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a race condition in RPC status tracking by replacing unsafe atomic operations with proper Compare-And-Swap (CAS) loops for updating maximum elapsed time fields.

  • Replaces non-atomic read-then-store operations with CAS loops to prevent race conditions
  • Ensures thread-safe updates to maxElapsed, succeededMaxElapsed, and failedMaxElapsed fields
  • Maintains the same logical behavior while fixing concurrency issues

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 173 to 181
for {
oldValue := atomic.LoadInt64(&rpcStatus.succeededMaxElapsed)
if oldValue >= elapsed {
break
}
if atomic.CompareAndSwapInt64(&rpcStatus.succeededMaxElapsed, oldValue, elapsed) {
break
}
}
Copy link

Copilot AI Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CAS loop could potentially spin indefinitely under high contention. Consider adding a limit to the number of retry attempts or using a backoff strategy to prevent excessive CPU usage.

Copilot uses AI. Check for mistakes.
Comment on lines 162 to 199
for {
oldValue := atomic.LoadInt64(&rpcStatus.maxElapsed)
if oldValue >= elapsed {
break
}
if atomic.CompareAndSwapInt64(&rpcStatus.maxElapsed, oldValue, elapsed) {
break
}
}

if succeeded {
if rpcStatus.succeededMaxElapsed < elapsed {
atomic.StoreInt64(&rpcStatus.succeededMaxElapsed, elapsed)
for {
oldValue := atomic.LoadInt64(&rpcStatus.succeededMaxElapsed)
if oldValue >= elapsed {
break
}
if atomic.CompareAndSwapInt64(&rpcStatus.succeededMaxElapsed, oldValue, elapsed) {
break
}
}

atomic.StoreInt32(&rpcStatus.successiveRequestFailureCount, 0)
} else {
atomic.StoreInt64(&rpcStatus.lastRequestFailedTimestamp, CurrentTimeMillis())
atomic.AddInt32(&rpcStatus.successiveRequestFailureCount, 1)
atomic.AddInt32(&rpcStatus.failed, 1)
atomic.AddInt64(&rpcStatus.failedElapsed, elapsed)
if rpcStatus.failedMaxElapsed < elapsed {
atomic.StoreInt64(&rpcStatus.failedMaxElapsed, elapsed)

for {
oldValue := atomic.LoadInt64(&rpcStatus.failedMaxElapsed)
if oldValue >= elapsed {
break
}
if atomic.CompareAndSwapInt64(&rpcStatus.failedMaxElapsed, oldValue, elapsed) {
break
}
}
}
Copy link

Copilot AI Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CAS loop pattern is duplicated three times with identical logic. Consider extracting this into a helper function like updateMaxElapsed(field *int64, elapsed int64) to reduce code duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
… and rpcStatus.failedMaxElapsed are correctly replaced through cas.

Fixes: #3006
Signed-off-by: yizhenqiang <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Sep 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 39.61%. Comparing base (3ca11da) to head (aa773be).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3010      +/-   ##
===========================================
- Coverage    39.63%   39.61%   -0.03%     
===========================================
  Files          457      457              
  Lines        38870    38876       +6     
===========================================
- Hits         15408    15402       -6     
- Misses       22201    22214      +13     
+ Partials      1261     1260       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AlexStocks
Copy link
Contributor

maybe you can add an ut for your PR.

@manzhizhen
Copy link
Contributor Author

maybe you can add an ut for your PR.

I try it

…rpcStatus.failedMaxElapsed are correctly replaced through cas. Add UT.

Fixes: #3006
@sonarqubecloud
Copy link

sonarqubecloud bot commented Sep 7, 2025

@marsevilspirit
Copy link
Member

I'm afraid this will reduce the performance of dubbo-go RPC calls.

@manzhizhen
Copy link
Contributor Author

manzhizhen commented Sep 8, 2025

I'm afraid this will reduce the performance of dubbo-go RPC calls.

I understand that 99% of requests will break here:
oldValue := atomic.LoadInt64(curValue) if oldValue >= newMaxValue { break }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants