Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAKING] HTTP.jl 2.0: Overhaul internals by switching to use aws-crt libraries #1213

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Jan 5, 2025

This isn't quite ready to merge yet, but starting the PR to get CI running and document what there is left to do:

[ ] Incorporate @IanButterworth's recent precompile work
[ ] Update documentation: manual and all inline docs; also write up a 1.X -> 2.0 migration guide
[ ] Update upstream AWS CRT libraries and fix anything required in our wrappers

@quinnj
Copy link
Member Author

quinnj commented Jan 30, 2025

Made some progress here, but blocked on 2 issues I need to dig into:

  • There's an occasional segfault from an unhandled C callback BoundsError; it's when we try to write to the temporary Base.BufferStream and it seems like the code should correctly resize the underlying IOBuffer of the BufferStream, but it doesn't for some reason and then we get a BoundsError; I'm actively investigating this now to try to find out if it's an actual bug in Base.BufferStream/IOBuffer, or how we're using BufferStream (pretty vanilla, but maybe?)
  • There's an sporadic test failure where providing an IOBuffer as a response body seems to sometimes duplicate part of the body writing. Possibly related to the first issue.

@quinnj
Copy link
Member Author

quinnj commented Jan 31, 2025

Ok, turns out the segfault is due to Base.BufferStream readavailable not being threadsafe, there's a PR to fix here: JuliaLang/julia#57211. I'll add a compat shim here until that's fully fixed/released. I can confirm that the same root cause was causing both issues mentioned above.

quinnj added a commit to JuliaServices/LibAwsIO.jl that referenced this pull request Feb 4, 2025
Creates a mutable struct wrapper for each default resource so
we can add appropraite finalizers. Also adds a convenience close_
method for each resource that will finalize the current resource and
reset it to "null".

This machinery is necessary for precompile workloads to appropriately
do a total cleanup of all process resources (as discovered in
JuliaWeb/HTTP.jl#1213)
quinnj added a commit to JuliaServices/LibAwsIO.jl that referenced this pull request Feb 4, 2025
Creates a mutable struct wrapper for each default resource so
we can add appropraite finalizers. Also adds a convenience close_
method for each resource that will finalize the current resource and
reset it to "null".

This machinery is necessary for precompile workloads to appropriately
do a total cleanup of all process resources (as discovered in
JuliaWeb/HTTP.jl#1213)
Comment on lines +33 to +37
close_all_clients!()
close_default_aws_server_bootstrap!()
close_default_aws_client_bootstrap!()
close_default_aws_host_resolver!()
close_default_aws_event_loop_group!()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a new event loop group, etc. be created instead of using the default? I wouldn't want this package to conflict with others because they both use default resources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, it's better that they do share the same event loop group. The point is that the default group should be able to saturate os resources, so it's better that packages/libraries coordinate tasks in the same group instead of having competing groups then possibly scheduling tasks in a non-optimal configuration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point, but if an event loop group is shared, who is responsible for closing it? It could be closed by one user when another user still needs it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the names of these functions is confusing because AFAIU the aws calls are referencing/de-referencing them, so it closes when the counter reaches zero. So confusing to say close?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point, but if an event loop group is shared, who is responsible for closing it? It could be closed by one user when another user still needs it.

In my mind, (and it would probably be helpful to document this perhaps in the LibAwsIO README or something), there are 2 different scenarios for LibAwsIO usage:

  1. Casual, REPL usage, scripts, etc. In this case, you may load libraries (Postgres.jl, HTTP.jl, etc.) where you want things to "just work" and do sensible default things. That's currently the case w/ the machinery here. Sensible default resources are set up, they can be used, we're not too worried about cleaning things up because everything will be taken care of when the process exits.
  2. Private application. In this case, again, you load libraries or perhaps use LibAwsIO directly, but you may have unique, specific settings you want to set for these LibAwsIO resources. You're free to create your own (not too hard, but takes some manual setup/effort), and then you can set them in LibAwsIO so that any package that relies on them will use the same, manually set up resources you'd like. This is as opposed to needing to pass the resources manually, for example, to every HTTP call you may have in a huge, nested application. Those kinds of one-off custom resources should still be supported, so if you had a specific HTTP request that you wanted to use slightly different resource group, you could pass them to that call specifically.

So to circle back and answer the original question: if you're relying on the default shared resources, then you shouldn't be closing them or worrying about closing them. They're global shared. If you're in a situation where you need to close them, then you should probably be initializing them yourself and passing them or setting them as the defaults yourself, and then you're free to close them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the names of these functions is confusing because AFAIU the aws calls are referencing/de-referencing them, so it closes when the counter reaches zero. So confusing to say close?

Hmmmm.....perhaps. But we are specifically calling this close_*default*_... which we are specifically interested in closing the default resources that have been set here. So yes, if you happen to have your own references to those resources, then calling these will prematurely finalize then and you wouldn't want to use them afterwards.

We could perhaps compose these a bit better with individual close methods on, e.g. the EventLoopGroup and then these close defaults could use that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quinnj added a commit to JuliaLang/julia that referenced this pull request Feb 4, 2025
It looks like these methods were just missed while overloading for
BufferStream.

There's also `readbytes!` where the current implementation will fallback
to the `LibuvStream` implementation that is currently not threadsafe.
What's the best approach there since the implementation is quite a bit
more involved? Just duplicate the code but for BufferStream? Should we
take the BufferStream lock and invoke the LibuvStream method? Open to
ideas there.

Also open to suggestions for having tests here? Not easy to simulate the
data race of writing and calling readavailable.

The fix here will unblock JuliaWeb/HTTP.jl#1213
(I'll probably do some compat shim there until this is fully released).

Thanks to @oscardssmith for rubber ducking this issue with me.

Probably most helpfully reviewed by @vtjnash.

---------

Co-authored-by: Jameson Nash <[email protected]>
KristofferC pushed a commit to JuliaLang/julia that referenced this pull request Feb 6, 2025
It looks like these methods were just missed while overloading for
BufferStream.

There's also `readbytes!` where the current implementation will fallback
to the `LibuvStream` implementation that is currently not threadsafe.
What's the best approach there since the implementation is quite a bit
more involved? Just duplicate the code but for BufferStream? Should we
take the BufferStream lock and invoke the LibuvStream method? Open to
ideas there.

Also open to suggestions for having tests here? Not easy to simulate the
data race of writing and calling readavailable.

The fix here will unblock JuliaWeb/HTTP.jl#1213
(I'll probably do some compat shim there until this is fully released).

Thanks to @oscardssmith for rubber ducking this issue with me.

Probably most helpfully reviewed by @vtjnash.

---------

Co-authored-by: Jameson Nash <[email protected]>
(cherry picked from commit ffc96bc)
KristofferC pushed a commit to JuliaLang/julia that referenced this pull request Feb 6, 2025
It looks like these methods were just missed while overloading for
BufferStream.

There's also `readbytes!` where the current implementation will fallback
to the `LibuvStream` implementation that is currently not threadsafe.
What's the best approach there since the implementation is quite a bit
more involved? Just duplicate the code but for BufferStream? Should we
take the BufferStream lock and invoke the LibuvStream method? Open to
ideas there.

Also open to suggestions for having tests here? Not easy to simulate the
data race of writing and calling readavailable.

The fix here will unblock JuliaWeb/HTTP.jl#1213
(I'll probably do some compat shim there until this is fully released).

Thanks to @oscardssmith for rubber ducking this issue with me.

Probably most helpfully reviewed by @vtjnash.

---------

Co-authored-by: Jameson Nash <[email protected]>
(cherry picked from commit ffc96bc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants