Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in AsyncCompletions.parse() with dynamically created models #2148

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mousberg
Copy link

Fixes #2146

This PR fixes a memory leak in the manually maintained library code (src/openai/lib/_parsing/_completions.py) where AsyncCompletions.parse() was retaining schema representations of dynamically created Pydantic models indefinitely.

Changes Made:

  • Implemented WeakKeyDictionary cache for schema objects in _parsing/_completions.py
  • Added comprehensive test suite in tests/lib/_parsing/test_memory_leak.py

Technical Details:

The fix uses Python's WeakKeyDictionary to store schema representations, allowing them to be garbage collected when their corresponding model types are no longer referenced. This prevents the unbounded memory growth observed when repeatedly calling parse() with new models created via create_model().

Test Coverage:

New test suite verifies:

  • Schema cache properly handles dynamic models
  • Memory is released when models are no longer referenced
  • Both synchronous and asynchronous usage patterns

References:

…models

This commit fixes a memory leak issue in AsyncCompletions.parse() when repeatedly
called with Pydantic models created via create_model(). The issue was occurring
because schema representations of models were being retained indefinitely.

The fix implements a WeakKeyDictionary cache that allows the schema objects to be
garbage-collected when the model types are no longer referenced elsewhere in code.

Added test cases to verify the fix prevents memory leaks with dynamically created
models.
@mousberg mousberg requested a review from a team as a code owner February 27, 2025 17:32
Comment on lines +32 to +33
# Cache to store weak references to schema objects
_schema_cache = weakref.WeakKeyDictionary()
Copy link

@anteverse anteverse Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mousberg ,

Thanks for digging into this. As I understand this cache could help in the optimization of type_to_response_format_param as it will not recompute an already existing schema. I guess it solves the issue 2146 for the given example. But I don't see how it could stop the memory from rising up when we submit models that are always different.
Could we use some kind of size max size for this cache? Something like an LRU?
Your first test case actually brings a very good insight: if we keep on generating new models, the cache size grows infinitely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, @anteverse

That's a good point about unbounded memory growth when continuously generating new models.

I did look into an LRU cache, but ran into challenges with picking the right cache key strategy (model identity vs. schema-based hashing) while also making sure weak references were properly maintained.

I see the problem as twofold:

  1. Ensuring proper garbage collection for unreferenced models
  2. Limiting cache size to avoid infinite growth with unique models

Your insight on the test case really highlights where things get tricky. If anyone's up for digging into an LRU-based approach, I'd love to collaborate!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,
Thanks for narrowing down the problems, I think you did most of the job.

digging into an LRU-based approach, I'd love to collaborate!

I'd love to collaborate as well.

ran into challenges with picking the right cache key strategy (model identity vs. schema-based hashing)

By any chance, do you have any connections at Pydantic? The folks there could help come up with a common key.

By the way, if we end up with a cache that has a limited size, what would be a good measure for the limit? Should it be frozen? Accessible to the end user? Should we have an heuristic based on the machine's capacity?

I'll run a few tests on my end as well!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anteverse Yeah, reached out to Samuel (Pydantic's founder), and he suggested an approach for the cache key:
Build a string from the fields and types, optionally hash it, and use that as the cache key.

He also mentioned that if we need more input, it might be worth posting on Pydantic's GitHub to see if the community has additional insights.

Thoughts? I can prototype this approach and see if it addresses the issue!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't see how it could stop the memory from rising up when we submit models that are always different.
Could we use some kind of size max size for this cache?

maybe I'm missing something but I don't see how this is a concern? if you're continuously creating new models then they should be getting GC'd at some point and if they're not, then you're still holding a reference to the model type yourself? if that is the case can you share more about your use case?

Copy link
Collaborator

@RobertCraigie RobertCraigie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This approach seems good but the tests need cleaning up and there are lint errors.

Comment on lines 68 to 73
# Create a mock client
mock_client = MagicMock()
mock_client.chat.completions.create = AsyncMock()

# Create the AsyncCompletions instance with our mock client
completions = AsyncCompletions(mock_client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be unused?

from openai.lib._parsing import type_to_response_format_param
from openai.lib._parsing._completions import _schema_cache

class TestMemoryLeak(unittest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not use unittest, can you delete this? it seems to be the same as the pytest version anyway

@@ -254,11 +262,16 @@ def type_to_response_format_param(
else:
raise TypeError(f"Unsupported response_format type - {response_format}")

return {
schema_param = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
schema_param = {
schema_param: ResponseFormatParam = {

@mousberg
Copy link
Author

@RobertCraigie Thanks for the feedback! I've addressed all the requested changes:

  • Removed the unittest class and switched to using pytest exclusively
  • Removed unused mock client and AsyncCompletions instance
  • Added proper type annotation for schema_param: schema_param: ResponseFormatParam = {
  • Fixed all lint issues in the test file

The tests are now passing and the lint checks are clean. Let me know if I've missed anything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible memory leak in AsyncCompletions.parse()
3 participants