Skip to content

fix(backend): deadlocks #3320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Mar 24, 2025
Merged

fix(backend): deadlocks #3320

merged 22 commits into from
Mar 24, 2025

Conversation

BlairCurrey
Copy link
Contributor

@BlairCurrey BlairCurrey commented Feb 24, 2025

Changes proposed in this pull request

  • fixes deadlocks by either using transaction after opening one, or moving calls before/after the commit
    • changes quote creation to not create then patch. which required some small changes to the expiryDate (base it off Date.now() before creating instead of quote.createdAt)
  • adds create local outgoing payment k6 script
  • adds caching to fees

Context

fixes #3319 #3226

Checklist

  • Related issues linked using fixes #number
  • Tests added/updated
  • Make sure that all checks pass
  • Bruno collection updated (if necessary)
  • Documentation issue created with user-docs label (if necessary)
  • OpenAPI specs updated (if necessary)

- fixes deadlocks by ensuring we dont open new connections before resolving open transactions
- forced knex connections to 1 to help find these cases
@github-actions github-actions bot added type: tests Testing related pkg: backend Changes in the backend package. type: source Changes business logic labels Feb 24, 2025
Copy link

netlify bot commented Feb 24, 2025

Deploy Preview for brilliant-pasca-3e80ec ready!

Name Link
🔨 Latest commit 4eb012f
🔍 Latest deploy log https://app.netlify.com/sites/brilliant-pasca-3e80ec/deploys/67ddbab5ec4d93000812da3f
😎 Deploy Preview https://deploy-preview-3320--brilliant-pasca-3e80ec.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

- payment returned from createOutgoingPayment previously
had walletAddress of undefined
(coming from withGraphFetched(quote)).
When assigning manually it included wallet address.
Updated get methods to include wallet address on quote as well.
Comment on lines 180 to 193
// TODO: should getQuote happen inside trx? wasnt in main (was inside but not using trx).
// If so, in the getQuote method, need to not only pass into IlpQuoteDetails but also connector.
// Probably should have IlpQuoteDetails usingt he trx but not sure about the rest (just
// including the IlpQuoteDetails insert would prly require refactor to to that here)
const quote = await deps.paymentMethodHandlerService.getQuote(
paymentMethod,
{
quoteId,
walletAddress,
receiver,
receiveAmount: options.receiveAmount,
debitAmount: options.debitAmount
}
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something still needs to be done here one way or another.

  1. return ilp quote details from get quote and save in transaction with quote insert. This assumes the ilp connector db calls do not need to use the transaction (they are not using it on main). This should be more performant as it greatly reduces the duration of the transaction. However it does leak the ilp payment method concern into quotes.

  2. Do this in the transaction. Pass the transaction into the connector and use for all the db calls.

Originally we called this inside the transaction. The transaction was passed in but only used to save the IlpQuoteDetails. It was not used by the connector. This getQuote is a heavy operation (for ilp payments) because of all the db/network calls so it would be nice not to include it in the transaction provided its safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea for 1. is a new saveAdditionalDetails(trx: Knex.Transction, quote: Quote) method on the paymentMethodHandlerService that inserts into IlpQuoteDetails for the ilp paymnet method, and does nothing for local payments. Keeps the ilp quote details encapsulated in the ilp payment method handler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

ilp quote details encapsulated in the ilp payment method handler.

is a good approach: Do you think if we start a new transaction before we call this, and then pass in an optional trx into the .getQuote method, it be a perf issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a good approach: Do you think if we start a new transaction before we call this, and then pass in an optional trx into the .getQuote method, it be a perf issue?

First, if we do this we need to use this transaction everywhere inside getQuote (including connector). Currently, and on main, this is not done inside the transaction. This would be required to prevent the same deadlocking issue at max connections.

I think that should determine the direction here. If everything in getQuote needs to start using the transaction we should pass it into getQuote and pass into the connector and use it. Otherwise we shouldn't call getQuote within the lifecycle of the transaction because it unnecessarily elongates the transaction. The extra work would be the ~11 (?) network calls between rafikis in the connector to get the quote. At that point the question is how to ensure the ilpQuoteDetails are still saved inside the transaction. Directly call the IlpQuoteDetails insert here or
some new paymentMethodHandlerService. saveAdditionalDetails(trx: Knex.Transction, quote: Quote) to expose it.

Copy link
Contributor Author

@BlairCurrey BlairCurrey Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we just discussed in our call, we're fine with the current behavior (not inside trx) but will still need to ensure the IlpQuoteDetails is happening inside the transaction. I think the new saveAdditionalDetails method will work fine for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further discussion we came to the conclusion that we dont need with ilp quote details and the quote. If there is a failure between the details insert and the quote insert it will not create any sort of consistency issues and will just exist in our db as something that happened during ilp quoting (although not utilized).

quote: UnfinalizedQuote,
receiver: Receiver
): Date {
const quoteExpiry = new Date(Date.now() + deps.config.quoteLifespan)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creates an expiry of now, instead of createdAt (since we are setting ahead of creation). This was to allow us to simply create the quote and not create then update it.

@BlairCurrey BlairCurrey marked this pull request as ready for review February 27, 2025 15:43
Copy link
Contributor

@mkurapov mkurapov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement

@@ -347,13 +357,36 @@ function calculateFixedSendQuoteAmounts(
}
}

function calculateExpiry(
deps: ServiceDependencies,
quote: UnfinalizedQuote,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quote is unused

}

async function getWalletAddressPage(
deps: ServiceDependencies,
options: ListOptions
): Promise<Quote[]> {
const quotes = await Quote.query(deps.knex)
.list(options)
.withGraphFetched('fee')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, because we are not using fees in the GraphQL resolvers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That, and if I recall correctly there were some inconsistencies with the centralized getQuote, getPage tests. Which expect the get methods and create methods to return the same object. As I recall the fee is no longer returned from the create. Or it's null instead of undefined.

Comment on lines 180 to 193
// TODO: should getQuote happen inside trx? wasnt in main (was inside but not using trx).
// If so, in the getQuote method, need to not only pass into IlpQuoteDetails but also connector.
// Probably should have IlpQuoteDetails usingt he trx but not sure about the rest (just
// including the IlpQuoteDetails insert would prly require refactor to to that here)
const quote = await deps.paymentMethodHandlerService.getQuote(
paymentMethod,
{
quoteId,
walletAddress,
receiver,
receiveAmount: options.receiveAmount,
debitAmount: options.debitAmount
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

ilp quote details encapsulated in the ilp payment method handler.

is a good approach: Do you think if we start a new transaction before we call this, and then pass in an optional trx into the .getQuote method, it be a perf issue?

@@ -22,7 +22,7 @@ export class Quote extends WalletAddressSubresource {

public estimatedExchangeRate!: number
public feeId?: string
public fee?: Fee
public fee?: Fee | null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the null used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to do with how it was populated in different scenarios. If it was joined but there was no fee vs not joined and maybe some other scenarios. Was causing problems in the centralized getPage tests or similar as I recall.

Comment on lines +248 to +250
return tracer.startActiveSpan(
'outgoingPaymentService.createOutgoingPayment',
async (span: Span) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

throw err
} finally {
stopTimerOP()
span.end()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to explicitly end the span? or does it do it by itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears so, yes.

Given this example from the docs:

export function rollTheDice(rolls: number, min: number, max: number) {
  // Create a span. A span must be closed.
  return tracer.startActiveSpan('rollTheDice', (span: Span) => {
    const result: number[] = [];
    for (let i = 0; i < rolls; i++) {
      result.push(rollOnce(min, max));
    }
    // Be sure to end the span!
    span.end();
    return result;
  });
}

And some testing. I removed the span.end() and no longer saw this outgoingPaymentService.createOutgoingPayment span.

@mkurapov mkurapov linked an issue Mar 18, 2025 that may be closed by this pull request
1 task
mkurapov
mkurapov previously approved these changes Mar 21, 2025
Copy link

🚀 Performance Test Results

Test Configuration:

  • VUs: 4
  • Duration: 1m0s

Test Metrics:

  • Requests/s: 44.23
  • Iterations/s: 14.76
  • Failed Requests: 0.00% (0 of 2661)
📜 Logs

> [email protected] run-tests:testenv /home/runner/work/rafiki/rafiki/test/performance
> ./scripts/run-tests.sh -e test "-k" "-q" "--vus" "4" "--duration" "1m"

Cloud Nine GraphQL API is up: http://localhost:3101/graphql
Cloud Nine Wallet Address is up: http://localhost:3100/
Happy Life Bank Address is up: http://localhost:4100/
cloud-nine-wallet-test-backend already set
cloud-nine-wallet-test-auth already set
happy-life-bank-test-backend already set
happy-life-bank-test-auth already set
     data_received..................: 928 kB 15 kB/s
     data_sent......................: 1.9 MB 32 kB/s
     http_req_blocked...............: avg=6.73µs   min=2.34µs  med=5.01µs   max=1.49ms   p(90)=6.1µs    p(95)=6.61µs  
     http_req_connecting............: avg=451ns    min=0s      med=0s       max=588.91µs p(90)=0s       p(95)=0s      
     http_req_duration..............: avg=89.81ms  min=8.43ms  med=75.88ms  max=569.36ms p(90)=151.81ms p(95)=176.67ms
       { expected_response:true }...: avg=89.81ms  min=8.43ms  med=75.88ms  max=569.36ms p(90)=151.81ms p(95)=176.67ms
     http_req_failed................: 0.00%  ✓ 0         ✗ 2661
     http_req_receiving.............: avg=79.01µs  min=23.51µs med=68.97µs  max=2.28ms   p(90)=101.04µs p(95)=134.43µs
     http_req_sending...............: avg=34.82µs  min=10.78µs med=26.16µs  max=2.38ms   p(90)=38.07µs  p(95)=53.1µs  
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s       max=0s       p(90)=0s       p(95)=0s      
     http_req_waiting...............: avg=89.69ms  min=8.28ms  med=75.76ms  max=568.9ms  p(90)=151.68ms p(95)=176.57ms
     http_reqs......................: 2661   44.232305/s
     iteration_duration.............: avg=270.75ms min=166.8ms med=256.31ms max=1.1s     p(90)=330.91ms p(95)=366.81ms
     iterations.....................: 888    14.760724/s
     vus............................: 4      min=4       max=4 
     vus_max........................: 4      min=4       max=4 

@BlairCurrey BlairCurrey requested a review from mkurapov March 23, 2025 03:36
@BlairCurrey BlairCurrey merged commit cbd3a0c into main Mar 24, 2025
44 checks passed
@BlairCurrey BlairCurrey deleted the bc/fix-deadlocks branch March 24, 2025 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: backend Changes in the backend package. type: source Changes business logic type: tests Testing related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Database Deadlocks Local payments performance script
2 participants