-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of previously found problems (will be updated) #438
Comments
@dgafka Hi! Perhaps in your practice you have experienced the problems described here, which we sometimes see when testing our solution. |
@dgafka Point 6 is particularly interesting... UPD2: There are pending queries to the database: DELETE FROM ecotone_deduplication WHERE handled_at <= $1 |
@dgafka I'll monitor which transactions might remain open for too long and in which cases this happens. But if you can, please check whether rollbacks are being done correctly everywhere and whether the deletion from ecotone_deduplication can be done periodically with a low frequency and definitely not within the main transaction. SELECT pid, usename, client_addr, application_name, state,
age(clock_timestamp(), xact_start) AS transaction_age,
query, query_start
FROM pg_stat_activity
WHERE state <> 'idle'
AND xact_start IS NOT NULL
AND age(clock_timestamp(), xact_start) > interval '1 minute'
ORDER BY transaction_age DESC; |
If you have transactions enabled for Command Bus, then projection update is atomic to events being appended to stream (projection trigger is wrapped within same transaction). |
@dgafka Thank you for your attention.
3, 5. I am especially worried about this moment since there is a requirement to hold the user's balance during authorizations from his cards, and here we need to immediately update both the balance projection and his account statement in the balance before, balance after format, and making separate streams for each user is not an option. The main problem is that we need to respond quickly, otherwise the provider will decline the card authorization.
|
@dgafka Hi SELECT pid, usename, client_addr, application_name, state,
age(clock_timestamp(), xact_start) AS transaction_age,
query, query_start
FROM pg_stat_activity
WHERE state <> 'idle'
AND xact_start IS NOT NULL
AND age(clock_timestamp(), xact_start) > interval '1 minute'
ORDER BY transaction_age DESC;
The issue automatically repeats after a long idle period, especially in the RabbitMQ listener. We will now try to limit the consumer's lifetime using the executionTimeLimit argument to one hour. Additionally, we have increased the Kubernetes graceful shutdown period from 30 seconds to 60 seconds — somewhere, a transaction is not being closed properly or a rollback is not being performed. |
Is it possible to implement periodic closing and reopening of the Doctrine connection for consumers? |
@dgafka We found several places where the transaction was not closed correctly in the roadrunner worker. I'll write more detailed results and what we came up with based on optimizations. |
Remember that you can build pretty easily any extension for the Message Consumers yourself. Specific cases does not need to be part of the framework for you to be able to build them :) So extending asynchronous endpoints is pretty much straightforward, you can read more in this section. |
I'm wondering if it's possible to affect DbalTransactionInterceptor ( |
Well that would need to be done through framework, as those are framework related classes. But it will be pretty straight forward to actually roll customized version of it and disable the framework one. |
@dgafka Why might this situation occur, and wouldn't we expect a synchronous retry by Ecotone\Messaging\Support\ConcurrencyException in the aggregate version conflict? Have you encountered this issue before?
|
@lifinsky as far as I remember that may happen if after sql exception before doing rollback another SQL is triggered |
I understand the technical reason, but there is only an event store and saving to the projection. Perhaps there is a problem with the foreign key or a violation of uniqueness in the projection, but then there should be a rollback? Instant retry is only configured for optimistic exception, so there is no second attempt. On Monday I will look into this in more detail... |
@dgafka Somehow I can't find information about 8 seconds. Probably you still have this default configuration above.
|
@lifinsky yep, you're right, thought the default was different. So to provide greater waiting time, would have to be customized with config. |
@dgafka Is it possible to allow modifying the gap detection retry setting at the service configuration level for each projection independently, or at least for all projections, including those running synchronously? I see that the documentation mentions it for pooling projections, but based on the code, it seems that this option is no longer available. |
#[Distributed]
#[EventHandler(listenTo: CardIssued::ROUTING_KEY)]
public function listen(CardIssued $event): void
{
$this->registerCard($event);
$this->createCardTransaction($event);
} @dgafka Are distributed endpoints wrapped in a transaction, similar to asynchronous ones? In our current example, two command buses are called independently, and we encountered a situation where the first one executed successfully, but the other failed when reading from the event store (unable to restore from a snapshot due to a missing BackedEnum converter – a very unpleasant situation. It would be great if Ecotone could automatically convert a BackedEnum to a string using its value). As a result, the first command bus persisted its changes to the projection, and now when retrying from the dead letter, we face a unique key duplication error. This is perhaps one of the most pressing problems facing. |
<?php
use Ecotone\Messaging\Conversion\Converter;
use Ecotone\Messaging\Conversion\MediaType;
use Ecotone\Messaging\Handler\TypeDescriptor;
use BackedEnum;
use Ecotone\Messaging\Annotation\Converter as MediaTypeConverter;
#[MediaTypeConverter]
class JsonBackedEnumConverter implements Converter
{
public function matches(TypeDescriptor $sourceType, MediaType $sourceMediaType, TypeDescriptor $targetType, MediaType $targetMediaType): bool
{
return ($sourceMediaType->isApplicationJson() && $targetType->isCompatibleWith(TypeDescriptor::create(BackedEnum::class)))
|| ($sourceType->isCompatibleWith(TypeDescriptor::create(BackedEnum::class)) && $targetMediaType->isApplicationJson());
}
public function convert($source, TypeDescriptor $sourceType, MediaType $sourceMediaType, TypeDescriptor $targetType, MediaType $targetMediaType)
{
if ($source instanceof BackedEnum) {
return $source->value;
}
if (is_string($source) || is_int($source)) {
return $targetType->toClassReflection()->getMethod('from')->invoke(null, $source);
}
throw new \InvalidArgumentException("Invalid value for BackedEnum conversion");
}
} Maybe this should work. Currently, an example of the exception we get after restoring from an aggregate snapshot is:
<?php
enum MyEnum: string
{
case VALUE_ONE = 'one';
case VALUE_TWO = 'two';
}
$source = 'one';
$targetType = new ReflectionClass(MyEnum::class);
$result = $targetType->getMethod('from')->invoke(null, $source);
echo $result->name; // VALUE_ONE |
Distributed Command Bus should trigger single Command Handler, not two. Can you provide code example, of what is actually happening? |
We need to trigger saving the read model in the repository at that place. The command handler is directly in the repository, and the second command saves the event-sourcing aggregate. Of course, we could extract a common service, but I don’t want to do that since the commands are completely different. I was sure that Distributed handler wraps it in a transaction. If that’s not the case, then we need to add the DbalTransaction attribute. |
You forgot to add Converter for the enum. You should most likely drop snapshots from db, because by JMS will convert it in custom format, yet won't be able to deserialize. So if you add custom converter, the format from your db, will most likely differ. |
So Distributed Command Handler is in repo, and then repo trigger another command? |
No. Distributed event handler in application level and triggers two command handlers. |
What do you think about default media type converter for backed enum? At the moment we have many similar converters for each enum and any omission leads to problems with snapshots which of course there is no point in doing too often. |
@dgafka Hi. Thank you for enum support and safety snapshots. |
If we work on resolving and removing deprecation warnings, the framework is getting closer and closer to becoming a core addition, especially for Symfony projects. A huge amount of work was done in 2024 and continues this year. I am also putting in maximum effort to implement and promote it for production projects. |
Distributed is related only to "allowing" given Handler to be executed by Distributed Bus. The case that you call two command handlers inside will actually trigger Transaction Interceptor too, because it does hook on both Therefore, if transacitonal wrapping does not happen, it's a bug, or maybe you're using different connections. |
Ecotone version(s) affected: latest
Description
If there is an error in the message payload converter for a distributed endpoint, then the message does not end up in the dead letter. Let's try to write a test for this.
When several event sourcing aggregates and projections for them are processed in an asynchronous endpoint, then in the case of an incorrect (already renamed) event class name in one of the streams with a delayed retry, we get many records in the projections without the changes themselves in the stream - it feels like a complete rollback of the transaction is not happening. We plan to also write a test to reproduce this.
Our projections all work synchronously with aggregates, while they receive a strange status in the
projections
table, eitheridle
orrunning
for the same eventsHow to solve a situation when some exceptions should go to delayed retry, while others should be processed through a custom
ServiceActivator
handler? For now we are making one commonServiceActivator
for theerror
channel and looking at the exception class there - but perhaps it’s worth providing a better option?For synchronous projection, what will happen in the case of gap detection (is such a scenario possible when the projection lags behind the event sourcing aggregate stream)? Can't there be a situation where the aggregate stream is updated after a sync retry (for example, due to OptimisticLockException), but the projection remains unupdated and waits for the next events in the stream?
There was a case where an AMQP consumer got stuck on an event-sourcing aggregate command, and in the queue, it was visible that the message was received but not acknowledged. The last log message was: Executing Command Handler...
It’s possible that this behavior is somehow related to defaultMemoryLimit: 256. It's hard to determine the exact cause since this happened only once so far. The first pod restart didn’t help, but after I completely deleted the deployment and started a new pod, all messages were processed successfully.
The text was updated successfully, but these errors were encountered: