Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: mls 1on1 race condition [WPB-15395] 🍒 #3239

Merged
merged 2 commits into from
Jan 24, 2025

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Jan 22, 2025

BugWPB-15395 [Android] 1:1 conversation throwing decryption errors after mls was enabled for the team

This PR was automatically cherry-picked based on the following PR:

Original PR description:



PR Submission Checklist for internal contributors

  • The PR Title

    • conforms to the style of semantic commits messages¹ supported in Wire's Github Workflow²
    • contains a reference JIRA issue number like SQPIT-764
    • answers the question: If merged, this PR will: ... ³
  • The PR Description

    • is free of optional paragraphs and you have filled the relevant parts to the best of your ability

What’s New in This PR?

This PR addresses a race condition during 1:1 MLS conversation creation and improves logging throughout the MLS-related codebase.

Issues

  • Race condition: When a user opens a 1:1 conversation at the same time a slow sync process is attempting to (re)join the same MLS conversation, we could end up with errors like:
    • ConversationAlreadyExists
    • Cannot execute operation because a pending commit exists
  • Lack of logs: Troubleshooting such concurrency issues was complicated by insufficient logging in some MLS-related operations.

Causes

  • Concurrent initiation of MLS commits:
    • The slow sync step attempts to join existing MLS conversations.
    • At the same time, the UI notifies that a conversation is open (NotifyConversationIsOpenUseCase), which also tries to resolve or establish a 1:1 MLS group.
    • Because both operations could run in parallel, we ended up with a “pending commit” blocking any subsequent commits.

Solutions

  1. Waiting for Slow Sync

    • In NotifyConversationIsOpenUseCaseImpl, we now explicitly wait for SlowSyncStatus to reach Complete before reevaluating the protocol for a 1:1 conversation.
    • This ensures no concurrent MLS commits are triggered during slow sync.
  2. Improved Logging

    • Added detailed log statements (kaliumLogger.d(...)) in JoinExistingMLSConversationUseCase, MLSConversationDataSource, and NotifyConversationIsOpenUseCaseImpl.
    • These logs help to trace when a commit bundle is sent or processed, and when establishing/joining self and 1:1 MLS groups.
  3. Failure Handling (Wipe on Add Member Failure)

    • In MLSConversationDataSource, if adding members to an MLS group fails, we attempt a conversation wipe.
    • This helps to avoid a stuck state if the MLS conversation remains in a “pending commit” state and can’t be recovered via normal commits.

* fix: race condition during 1on1 mls creation, more logs

* scope init fix

* test fix

* added ignore on welcome message when conv already exist, reverted wipe on member join

* added warning
@github-actions github-actions bot added cherry-pick PR is cherry-picking changes from another banch echoes: product-roadmap/bug Work contributing to resolve a bug not critical enough to have raised an incident. type: bug / fix 🐞 👕 size: S 🚨 Potential breaking changes labels Jan 22, 2025
@Garzas Garzas enabled auto-merge January 24, 2025 10:44
Copy link
Contributor Author

github-actions bot commented Jan 24, 2025

Test Results

3 408 tests  ±0   3 300 ✅ ±0   5m 52s ⏱️ +21s
  584 suites ±0     108 💤 ±0 
  584 files   ±0       0 ❌ ±0 

Results for commit 64d664c. ± Comparison against base commit 7456f22.

♻️ This comment has been updated with latest results.

Copy link
Contributor Author

@Garzas Garzas added this pull request to the merge queue Jan 24, 2025
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 75.00000% with 10 lines in your changes missing coverage. Please review.

Project coverage is 54.44%. Comparing base (7456f22) to head (64d664c).

Files with missing lines Patch % Lines
...nc/receiver/conversation/MLSWelcomeEventHandler.kt 60.00% 3 Missing and 1 partial ⚠️
...re/conversation/NotifyConversationIsOpenUseCase.kt 81.25% 2 Missing and 1 partial ⚠️
...um/logic/feature/conversation/ConversationScope.kt 0.00% 2 Missing ⚠️
...conversation/JoinExistingMLSConversationUseCase.kt 80.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3239   +/-   ##
========================================
  Coverage    54.43%   54.44%           
========================================
  Files         1271     1271           
  Lines        37062    37092   +30     
  Branches      3764     3766    +2     
========================================
+ Hits         20176    20195   +19     
- Misses       15466    15474    +8     
- Partials      1420     1423    +3     
Files with missing lines Coverage Δ
...gic/data/conversation/MLSConversationRepository.kt 84.05% <100.00%> (+0.16%) ⬆️
...conversation/JoinExistingMLSConversationUseCase.kt 80.95% <80.00%> (+0.46%) ⬆️
...um/logic/feature/conversation/ConversationScope.kt 0.00% <0.00%> (ø)
...re/conversation/NotifyConversationIsOpenUseCase.kt 90.32% <81.25%> (-9.68%) ⬇️
...nc/receiver/conversation/MLSWelcomeEventHandler.kt 85.93% <60.00%> (-3.16%) ⬇️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7456f22...64d664c. Read the comment docs.

@datadog-wireapp
Copy link

Datadog Report

Branch report: fix/mls-1on1-race-condition-cherry-pick
Commit report: 8092e9e
Test service: kalium-jvm

✅ 0 Failed, 3300 Passed, 108 Skipped, 1m 2.02s Total Time

Merged via the queue into develop with commit 7f85d16 Jan 24, 2025
23 checks passed
@Garzas Garzas deleted the fix/mls-1on1-race-condition-cherry-pick branch January 24, 2025 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick PR is cherry-picking changes from another banch echoes: product-roadmap/bug Work contributing to resolve a bug not critical enough to have raised an incident. 🚨 Potential breaking changes 👕 size: S type: bug / fix 🐞
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants