Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't WaitForResourceAsync KnownResourceStates.Finished #4878

Closed
1 task done
eerhardt opened this issue Jul 12, 2024 · 34 comments
Closed
1 task done

Can't WaitForResourceAsync KnownResourceStates.Finished #4878

eerhardt opened this issue Jul 12, 2024 · 34 comments
Assignees
Milestone

Comments

@eerhardt
Copy link
Member

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

It's not possible to get notification when a resource has been stopped/finished when after await app.StopAsync();.

I'm trying to write a test that uses a Docker Volume, and when the test is done, it deletes the volume. However, the volume is still in use even after await app.StopAsync(); returns, which means the container using the volume is still running.

I'd like a way to get notified when the resource/container is finished so I can delete the volume.

One reason this isn't possible today is that WaitForResourceAsync cancels once the app is stopping here:

public async Task<string> WaitForResourceAsync(string resourceName, IEnumerable<string> targetStates, CancellationToken cancellationToken = default)
{
using var watchCts = CancellationTokenSource.CreateLinkedTokenSource(_applicationStopping, cancellationToken);
var watchToken = watchCts.Token;
await foreach (var resourceEvent in WatchAsync(watchToken).ConfigureAwait(false))

Once the app is stopping, it cancels this loop, which means we won't get more notifications about the resource.

Expected Behavior

I'd like to get a notification when the container is finished.

Steps To Reproduce

await using var app = await appHost.BuildAsync();
var resourceNotificationService = app.Services.GetRequiredService<ResourceNotificationService>();
await app.StartAsync();

await app.StopAsync();

await resourceNotificationService.WaitForResourceAsync("webfrontend", KnownResourceStates.Finished).WaitAsync(TimeSpan.FromSeconds(30));

Exceptions (if any)

No response

.NET Version info

No response

Anything else?

cc @DamianEdwards @karolz-ms @mitchdenny

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication label Jul 12, 2024
@karolz-ms
Copy link
Member

@eerhardt and I chatted. We think the main issue is that as soon as app.StopAsync() is called, the resource service and ApplicationExecutor stops listening to resource changes. So even if DcpOptions.DeleteResourcesOnShutdown is set, there is no way for the Aspire model to learn that the container (or any resource for that matter) has changed state, or was deleted.

It would probably make sense to change this behavior and differentiate between stopping the application (which would stop application executables and containers, but preserve the Aspire model and its ability to get state updates) vs disposing of the application (which would dispose of the model, stop DCP orchestrator etc.)

@ealeykin
Copy link

@karolz-ms I'm observing the same behavior even before StopAsync - not getting a Finished notification when the target host exits. Well, from time to time I get it, but most of the time it's stuck on waiting till the timeout.

@davidfowl
Copy link
Member

@DamianEdwards this is the code you added to unhook the notification service on shutdown automatically. We'd need to allow listening to changes past the stopping of the application (but before disposal)

@DamianEdwards
Copy link
Member

We'd need to allow listening to changes past the stopping of the application (but before disposal)

Is there an existing signal for that or would we need to do something custom?

@davidfowl
Copy link
Member

I think shutting down in dispose would be better (implement IDisposable or IAsyncDisposable)

@karolz-ms
Copy link
Member

This issue covers 3 different use cases: stopping a resource vs deleting a resource vs disposing of the application. We should recognize that they are different and are useful for different use cases. We should be clear about what use case we are talking about, what mechanism(s) we want to use for that use case(s), and what works as expected vs needs improvement. Some thoughts on the original use case that prompted @eerhardt to open this issue follow.

To me the right mechanism seems deleting the container that is using a volume. When the container is deleted, the app host should receive a notification and can proceed to delete the container volume that the container was using. This is all supported by the app orchestrator today and should work, provided the app model allows the app host code to do this (not 100% sure, some work may be necessary).

Alternatively the orchestrator could support marking the volume as "should be deleted when the app shuts down". That would be a new feature for the orchestrator. The test could then dispose the whole app and rely on automatic cleanup by app orchestrator.

Stopping a container in this scenario is not sufficient, because a stopped container is still holding a reference to the volume. That is just how Docker/Podman work.

Disposing the app host while listening to notifications is not going to work because (once #7098 is merged) disposing the app triggers wholesale resource cleanup and orchestrator shutdown. The app orchestrator does not make any guarantees about the order in which resources are cleaned up, and it does not guarantee sending notifications about resources being cleaned up either. It just tries to get the machine to clean state as quickly as possible.

@DamianEdwards
Copy link
Member

I think shutting down in dispose would be better (implement IDisposable or IAsyncDisposable)

Disposing the app host while listening to notifications is not going to work because (once #7098 is merged) disposing the app triggers wholesale resource cleanup and orchestrator shutdown. The app orchestrator does not make any guarantees about the order in which resources are cleaned up, and it does not guarantee sending notifications about resources being cleaned up either. It just tries to get the machine to clean state as quickly as possible.

Regardless, it seems it would still make sense to change ResourceNotificationService to not kill the loop until it's disposed so that notifications sent after app host stopping but before disposal can be seen.

@karolz-ms
Copy link
Member

Regardless, it seems it would still make sense to change ResourceNotificationService to not kill the loop until it's disposed so that notifications sent after app host stopping but before disposal can be seen.

Agreed

@davidfowl
Copy link
Member

It turns out when you let DCP shutdown it may not put resources in the Finished state. I observed Stopping and then nothing. This doesn't feel like a bug but we need to decide if this is the desired behavior.

https://github.com/dotnet/aspire/compare/davidfowl/stop-listeing-ondispose?expand=1

@DamianEdwards
Copy link
Member

Yeah this is what I saw IIRC. If you wait a bit after calling .Stop() do more state changes come through? Or is it the case that once you ask DCP to shutdown (via .Stop()) it simply doesn't stick around to ensure it can pump the final resource states back to the app host? Would such a change in DCP be simplish to make? I appreciate that even if DCP sends those updates there's no guarantee the app host reads them from the notification stream before DCP shuts down the endpoint, unless we want to add more coordination to the shutdown process.

@karolz-ms
Copy link
Member

@karolz-ms It seems like we need 2 phase shutdown:

  1. Stop resources
  2. Stop dcp.

As part of 1, you would run all resources through the terminal states, 2 shuts down dcp. The app host would call 1 in StopAsync and 2 in DisposeAsync

We can definitively do this in DCP. To a large degree it is possible already, just not through a single call. Namely, both Containers and Executables have a spec property named Stop; if you set that to true, the object will transition to final state.

@davidfowl davidfowl modified the milestones: Backlog, 9.1 Jan 23, 2025
@davidfowl
Copy link
Member

If it's easy we can do it in 9.1 but we don't have plans to take advantage of this in 9.1

@dbreshears dbreshears modified the milestones: 9.1, Backlog Jan 27, 2025
@karolz-ms
Copy link
Member

Synced up with @eerhardt offline. What @davidfowl (stop resources call + stop DCP call) seems to meet expectations for the scenario.

@karolz-ms karolz-ms added the untriaged New issue has not been triaged label Feb 3, 2025
@karolz-ms
Copy link
Member

We should re-triage, this might be necessary to prevent #7215

@davidfowl davidfowl modified the milestones: Backlog, 9.1 Feb 3, 2025
@karolz-ms karolz-ms removed the untriaged New issue has not been triaged label Feb 3, 2025
@davidfowl davidfowl reopened this Feb 5, 2025
@DamianEdwards
Copy link
Member

Is the work left here to update Aspire.Hosting to issue the relevant two-phase commands to DCP appropriately, i.e. Stop() -> stop, Dispose() -> shutdown?

@davidfowl
Copy link
Member

Yes we never merged #4878 (comment)

@DamianEdwards
Copy link
Member

@davidfowl OK so DCP got updates to ensure it can send final states now and we just need to merge your changes?

@davidfowl
Copy link
Member

I assume we also need to call the new APIs in DCP yep.

@DamianEdwards
Copy link
Member

OK, is this correctly assigned? @karolz-ms are you planning to do the work on the Aspire side for this?

@karolz-ms
Copy link
Member

Yes, I am working on this as we speak. Specifically, on adding a new property to DcpExecutor that determines whether StopAsync() waits for resource cleanup to be complete or not. That will then be leveraged by test application builder. Should have a PR out for review sometime later today.

@karolz-ms
Copy link
Member

Should be resolved with #7446

@davidfowl
Copy link
Member

I think we need a test to actually use this pattern to see if we need more changes to the app model codez.

@DamianEdwards
Copy link
Member

I think we need a test to actually use this pattern to see if we need more changes to the app model codez.

I can update the samples tests to do it. Is there are code snippet someone can point me at?

@eerhardt
Copy link
Member Author

I think we need a test to actually use this pattern to see if we need more changes to the app model codez.

Also #7477 could use this as well.

cc @sebastienros

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants