-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak in AsyncCompletions.parse() with dynamically created models #2148
Open
mousberg
wants to merge
2
commits into
openai:main
Choose a base branch
from
mousberg:fix-memory-leak-in-async-completions-parse
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
import gc | ||
from typing import List | ||
|
||
import pytest | ||
from pydantic import Field, create_model | ||
|
||
from openai.lib._parsing import type_to_response_format_param | ||
from openai.lib._parsing._completions import _schema_cache | ||
|
||
|
||
@pytest.mark.asyncio | ||
async def test_async_completions_parse_memory(): | ||
"""Test if AsyncCompletions.parse() doesn't leak memory with dynamic models""" | ||
StepModel = create_model( | ||
"Step", | ||
explanation=(str, Field()), | ||
output=(str, Field()), | ||
) | ||
|
||
# Clear the cache and record initial state | ||
_schema_cache.clear() | ||
initial_cache_size = len(_schema_cache) | ||
|
||
# Simulate the issue by creating multiple models and making calls | ||
models = [] | ||
for i in range(10): | ||
# Create a new dynamic model each time | ||
new_model = create_model( | ||
f"MathResponse{i}", | ||
steps=(List[StepModel], Field()), | ||
final_answer=(str, Field()), | ||
) | ||
models.append(new_model) | ||
|
||
# Convert to response format and check if it's in the cache | ||
type_to_response_format_param(new_model) | ||
assert new_model in _schema_cache | ||
|
||
# Record cache size with all models referenced | ||
cache_size_with_references = len(_schema_cache) | ||
|
||
# Let the models go out of scope and trigger garbage collection | ||
models = None | ||
gc.collect() | ||
|
||
# After garbage collection, the cache should be significantly reduced | ||
cache_size_after_gc = len(_schema_cache) | ||
assert cache_size_after_gc < cache_size_with_references | ||
# The cache size should be close to the initial size (with some tolerance) | ||
assert cache_size_after_gc < cache_size_with_references / 2 |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mousberg ,
Thanks for digging into this. As I understand this cache could help in the optimization of
type_to_response_format_param
as it will not recompute an already existing schema. I guess it solves the issue 2146 for the given example. But I don't see how it could stop the memory from rising up when we submit models that are always different.Could we use some kind of size max size for this cache? Something like an LRU?
Your first test case actually brings a very good insight: if we keep on generating new models, the cache size grows infinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, @anteverse
That's a good point about unbounded memory growth when continuously generating new models.
I did look into an LRU cache, but ran into challenges with picking the right cache key strategy (model identity vs. schema-based hashing) while also making sure weak references were properly maintained.
I see the problem as twofold:
Your insight on the test case really highlights where things get tricky. If anyone's up for digging into an LRU-based approach, I'd love to collaborate!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
Thanks for narrowing down the problems, I think you did most of the job.
I'd love to collaborate as well.
By any chance, do you have any connections at Pydantic? The folks there could help come up with a common key.
By the way, if we end up with a cache that has a limited size, what would be a good measure for the limit? Should it be frozen? Accessible to the end user? Should we have an heuristic based on the machine's capacity?
I'll run a few tests on my end as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anteverse Yeah, reached out to Samuel (Pydantic's founder), and he suggested an approach for the cache key:
Build a string from the fields and types, optionally hash it, and use that as the cache key.
He also mentioned that if we need more input, it might be worth posting on Pydantic's GitHub to see if the community has additional insights.
Thoughts? I can prototype this approach and see if it addresses the issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'm missing something but I don't see how this is a concern? if you're continuously creating new models then they should be getting GC'd at some point and if they're not, then you're still holding a reference to the model type yourself? if that is the case can you share more about your use case?