Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in AsyncCompletions.parse() with dynamically created models #2148

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion src/openai/lib/_parsing/_completions.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from __future__ import annotations

import json
import weakref
from typing import TYPE_CHECKING, Any, Iterable, cast
from typing_extensions import TypeVar, TypeGuard, assert_never

Expand Down Expand Up @@ -28,6 +29,9 @@
from ...types.chat.completion_create_params import ResponseFormat as ResponseFormatParam
from ...types.chat.chat_completion_message_tool_call import Function

# Cache to store weak references to schema objects
_schema_cache = weakref.WeakKeyDictionary()
Comment on lines +32 to +33
Copy link

@anteverse anteverse Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mousberg ,

Thanks for digging into this. As I understand this cache could help in the optimization of type_to_response_format_param as it will not recompute an already existing schema. I guess it solves the issue 2146 for the given example. But I don't see how it could stop the memory from rising up when we submit models that are always different.
Could we use some kind of size max size for this cache? Something like an LRU?
Your first test case actually brings a very good insight: if we keep on generating new models, the cache size grows infinitely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, @anteverse

That's a good point about unbounded memory growth when continuously generating new models.

I did look into an LRU cache, but ran into challenges with picking the right cache key strategy (model identity vs. schema-based hashing) while also making sure weak references were properly maintained.

I see the problem as twofold:

  1. Ensuring proper garbage collection for unreferenced models
  2. Limiting cache size to avoid infinite growth with unique models

Your insight on the test case really highlights where things get tricky. If anyone's up for digging into an LRU-based approach, I'd love to collaborate!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,
Thanks for narrowing down the problems, I think you did most of the job.

digging into an LRU-based approach, I'd love to collaborate!

I'd love to collaborate as well.

ran into challenges with picking the right cache key strategy (model identity vs. schema-based hashing)

By any chance, do you have any connections at Pydantic? The folks there could help come up with a common key.

By the way, if we end up with a cache that has a limited size, what would be a good measure for the limit? Should it be frozen? Accessible to the end user? Should we have an heuristic based on the machine's capacity?

I'll run a few tests on my end as well!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anteverse Yeah, reached out to Samuel (Pydantic's founder), and he suggested an approach for the cache key:
Build a string from the fields and types, optionally hash it, and use that as the cache key.

He also mentioned that if we need more input, it might be worth posting on Pydantic's GitHub to see if the community has additional insights.

Thoughts? I can prototype this approach and see if it addresses the issue!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't see how it could stop the memory from rising up when we submit models that are always different.
Could we use some kind of size max size for this cache?

maybe I'm missing something but I don't see how this is a concern? if you're continuously creating new models then they should be getting GC'd at some point and if they're not, then you're still holding a reference to the model type yourself? if that is the case can you share more about your use case?


ResponseFormatT = TypeVar(
"ResponseFormatT",
# if it isn't given then we don't do any parsing
Expand Down Expand Up @@ -243,6 +247,10 @@ def type_to_response_format_param(
# can only be a `type`
response_format = cast(type, response_format)

# Check if we already have a schema for this type in the cache
if response_format in _schema_cache:
return _schema_cache[response_format]

json_schema_type: type[pydantic.BaseModel] | pydantic.TypeAdapter[Any] | None = None

if is_basemodel_type(response_format):
Expand All @@ -254,11 +262,16 @@ def type_to_response_format_param(
else:
raise TypeError(f"Unsupported response_format type - {response_format}")

return {
schema_param: ResponseFormatParam = {
"type": "json_schema",
"json_schema": {
"schema": to_strict_json_schema(json_schema_type),
"name": name,
"strict": True,
},
}

# Store a weak reference to the schema parameter
_schema_cache[response_format] = schema_param

return schema_param
50 changes: 50 additions & 0 deletions tests/lib/_parsing/test_memory_leak.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import gc
from typing import List

import pytest
from pydantic import Field, create_model

from openai.lib._parsing import type_to_response_format_param
from openai.lib._parsing._completions import _schema_cache


@pytest.mark.asyncio
async def test_async_completions_parse_memory():
"""Test if AsyncCompletions.parse() doesn't leak memory with dynamic models"""
StepModel = create_model(
"Step",
explanation=(str, Field()),
output=(str, Field()),
)

# Clear the cache and record initial state
_schema_cache.clear()
initial_cache_size = len(_schema_cache)

# Simulate the issue by creating multiple models and making calls
models = []
for i in range(10):
# Create a new dynamic model each time
new_model = create_model(
f"MathResponse{i}",
steps=(List[StepModel], Field()),
final_answer=(str, Field()),
)
models.append(new_model)

# Convert to response format and check if it's in the cache
type_to_response_format_param(new_model)
assert new_model in _schema_cache

# Record cache size with all models referenced
cache_size_with_references = len(_schema_cache)

# Let the models go out of scope and trigger garbage collection
models = None
gc.collect()

# After garbage collection, the cache should be significantly reduced
cache_size_after_gc = len(_schema_cache)
assert cache_size_after_gc < cache_size_with_references
# The cache size should be close to the initial size (with some tolerance)
assert cache_size_after_gc < cache_size_with_references / 2