Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour on snapshot #389

Closed
niklasuhrberg opened this issue Sep 11, 2024 · 10 comments
Closed

Unexpected behaviour on snapshot #389

niklasuhrberg opened this issue Sep 11, 2024 · 10 comments
Assignees

Comments

@niklasuhrberg
Copy link

I have reproduced what seems to be erroneous behaviour using the steps below:

  1. Start two servers A and B. B disables cache.

  2. Snapshot created on closing A.

  3. Snapshot now has version N and last event has version N-1

  4. Create one more event using already running service B => new event gets version N. Now both the snapshot and the last event have the same version, but the snapshot does not include the information in the event with version number N.

  5. Get the state from service B (using Backend.respository.get(id)) => the state does not contain information in the event issued right before in 4.

  6. Start another instance C and get state => result does not include event with version N

  7. Also issuing a command fails, no more event gets persisted. This goes for both servers.

  8. If the version number of the snapshot is now lowered by one, the services retrieves the correct state, i.e. taking all events into account.

  9. But, issuing commands still fails with the log "service raised an error: class edomata.backend.BackendError$MaxRetryExceeded$"

8 was anticipated, but I was surprised by 9.

The service becomes unusable once this state has materialized.

Copy link
Contributor

Thank you for submitting this issue and welcome to the edomata community!
We are glad that you are here and we appreciate your contributions.

@hnaderi
Copy link
Owner

hnaderi commented Sep 12, 2024

Hi @niklasuhrberg

  • Did you disable cache on all the instances, or just on B?
  • The maximum snapshot version should be the number event for the entity in a journal. If you lower it by one, it means that it was the state before the last state, so that stream will see wrong states and basically the timeline is corrupted.
  • The servers that disable the cache should be able to work concurrently. If there is a server with the default cache enabled in a distributed setup, those instances might face the problem you mentioned. That's because they don't read the snapshot from the storage anymore if they have it in memory, and they trust their own data, and it causes conflicts in the decision-making process.
  • I couldn't reproduce the problem using the example project, here is how I tried. It doesn't work with cache as expected and works without the cache with no problems. Can you please check if that's the case for you and if not post a minimal reproduction scenario like this?
package dev.hnaderi.example

import cats.data.EitherNec
import cats.effect.IO
import cats.effect.IOApp
import cats.effect.kernel.Resource
import dev.hnaderi.example.accounts.*
import edomata.core.CommandMessage

import java.time.Instant
import java.util.UUID

extension [T](e: IO[T]) {
  def print(lbl: String) = IO.println(lbl) >> e.flatTap(IO.println)
}

object Main extends IOApp.Simple {

  private def newID = UUID.randomUUID().toString()

  def run: IO[Unit] = Resource
    .both(Application[IO](), Application[IO]())
    .use((app1, app2) =>
      for {
        address <- IO.randomUUID.map(_.toString()).print("ADDRESS:")
        printA = app1.accounts.storage.repository
          .get(address)
          .print("STATE ON A")
        printB = app2.accounts.storage.repository
          .get(address)
          .print("STATE ON B")

        _ <- app1.accounts
          .service(
            CommandMessage(
              newID,
              Instant.now(),
              address,
              Command.Open
            )
          )
          .print("OPEN ON A")

        _ <- printA

        _ <- app2.accounts
          .service(
            CommandMessage(
              newID,
              Instant.now(),
              address,
              Command.Deposit(100)
            )
          )
          .print("DEPOSIT ON B")

        _ <- printB
        _ <- printA

        _ <- app1.accounts
          .service(
            CommandMessage(
              newID,
              Instant.now(),
              address,
              Command.Deposit(50)
            )
          )
          .print("DEPOSIT ON A")

        _ <- printB
        _ <- printA
      } yield ()
    )
}

@hnaderi hnaderi self-assigned this Sep 12, 2024
@niklasuhrberg
Copy link
Author

"Did you disable cache on all the instances, or just on B?"

Just on B.

@niklasuhrberg
Copy link
Author

"The maximum snapshot version should be the number event for the entity in a journal"

Is this consistent with:

  1. Start two servers A and B. B disables cache.

  2. Snapshot created on closing A.

  3. Snapshot now has version N and last event has version N-1

I interpret your statement that the version should be the same for the snapshot and the event.
Note that when I shut down server A (using the default cache) I have only issued requests to server A , not B.

@hnaderi
Copy link
Owner

hnaderi commented Sep 12, 2024

I interpret your statement that the version should be the same for the snapshot and the event. Note that when I shut down server A (using the default cache) I have only issued requests to server A , not B.

No, the event versioning starts from zero and snapshots from one. It might seem confusing at first, but it has a simple logic behind it:

  • States with version zero are the same as an empty entity (initial state, so they don't get persisted as they aren't created yet)
  • Each event version points to the state version that it was decided based on. So if you see state version N and decide to emit an event, the event version would be N, the new state version would become N+1 and thus the total number of events is N+1
  • Putting it in other words, each event with version V is the transition between the state version V to V+1

@niklasuhrberg
Copy link
Author

Ok, thanks.

But what about item 4:
4. Create one more event using already running service B => new event gets version N. Now both the snapshot and the last event have the same version, but the snapshot does not include the information in the event with version number N.

Note that this service (B) has not yet been involved at all and it does not use caching.
Still, the observable results is in 4 and 5.

I hear that you were not able to recreate the results, but I also cannot see how your application does the same as my setup. (I will take a closer look later!)
My setup does not violate the constraint that a service using caching must be the single writer for a specific entity. Indeed, it is the single writer until it is shut down.

@hnaderi
Copy link
Owner

hnaderi commented Sep 12, 2024

@niklasuhrberg I finally managed to reproduce the problem. I haven't found the root cause yet.
I will keep you updated.

@niklasuhrberg
Copy link
Author

Thanks a lot for that!

@hnaderi
Copy link
Owner

hnaderi commented Sep 12, 2024

@niklasuhrberg The problem is now fixed with v0.12.4.
It was a bug that was introduced in one of the previous versions where I refactored the journal reader and I didn't notice that the test that covers this is not there anymore!
Finding the bug didn't take much time, but finding where it was introduced and accepting that such a silly bug was slipped through wasn't an easy task.
Anyway, thank you for the report and extra thanks for giving the steps to reproduce it!

@niklasuhrberg
Copy link
Author

@hnaderi
Good to hear and I'm glad my report was productive.

Thanks so much for the fast response and fix.

@hnaderi hnaderi closed this as completed Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants