Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ def default(self, o):
return dataclasses.asdict(o)
if hasattr(o, "to_json"):
return o.to_json()
if hasattr(o, "model_dump_json"):
return o.model_dump_json()
Comment on lines +18 to +19
Copy link

@coderabbitai coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py | head -100

Repository: traceloop/openllmetry

Length of output: 3645


Replace model_dump_json() with model_dump().

The model_dump_json() method returns a JSON string, but JSONEncoder.default() expects a Python object (dict, list, etc.). Returning a JSON string causes it to be escaped and encoded again by json.dumps(), resulting in malformed JSON data in span attributes.

Use model_dump() instead, which returns a dictionary:

-        if hasattr(o, "model_dump_json"):
-            return o.model_dump_json()
+        if hasattr(o, "model_dump"):
+            return o.model_dump()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if hasattr(o, "model_dump_json"):
return o.model_dump_json()
if hasattr(o, "model_dump"):
return o.model_dump()
🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py
around lines 18-19, the code calls model_dump_json() which returns a JSON string
and causes double-encoding when JSONEncoder.default() hands the value to
json.dumps(); change the call to model_dump() so a Python dict/list is returned
instead (and ensure the returned object is JSON-serializable before returning).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a good point that potentially applies to the original change in sdk as well.

When we call model_dump_json we are returned a string representation of that class. If we instead call o.model_dump(mode='json') we are returned a python dictionary where each value is ensured to be a JSON-dumpable type. (See the documentation here).

Here is an example of what calling o.model_dump(mode='json') instead of o.model_dump_json() looks like in Traceloop:
image

Compare this with the screenshot in the initial PR comment, or here in the span listing for a before/ after of the original implementation vs. calling o.model_dump(mode='json')
image

IMO model_dump is preferable

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check all three JSONEncoder files for model_dump_json usage
echo "=== traceloop-sdk/utils/json_encoder.py ==="
rg -n "model_dump_json" packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py -C 2

echo -e "\n=== haystack/utils.py ==="
rg -n "model_dump_json" packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py -C 2

echo -e "\n=== llamaindex/utils.py ==="
rg -n "model_dump_json" packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py -C 2

Length of output: 702


@ryanamannion, thank you for the detailed analysis and the visual evidence! You're absolutely right that model_dump(mode='json') is the better approach here. The screenshots clearly show the improvement—avoiding the double-encoding issue where the JSON string gets escaped again.

All three JSON encoder files currently use model_dump_json() and should be updated consistently to use model_dump(mode='json'):

  1. packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py (lines 17-18)
  2. packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py (lines 18-19)
  3. packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py (lines 67-68)

The suggested change for each file:

if hasattr(o, "model_dump"):
    return o.model_dump(mode='json')

This ensures that:

  • A Python dictionary is returned (not a JSON string)
  • All values in the dictionary are JSON-serializable
  • The JSON encoder can properly encode the structure without double-encoding

The mode='json' parameter is crucial as it handles types like datetime, UUID, etc., converting them to JSON-compatible representations.

return super().default(o)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ class JSONEncoder(json.JSONEncoder):
def default(self, o):
if dataclasses.is_dataclass(o):
return dataclasses.asdict(o)
if hasattr(o, "model_dump_json"):
return o.model_dump_json()
Comment on lines +67 to +68
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Use model_dump() instead of model_dump_json().

Same issue as in the haystack utils file: model_dump_json() returns a JSON string, but JSONEncoder.default() expects a Python object. This causes double-encoding when json.dumps() is called with this encoder (lines 81, 90).

🔎 Proposed fix
-        if hasattr(o, "model_dump_json"):
-            return o.model_dump_json()
+        if hasattr(o, "model_dump"):
+            return o.model_dump()
🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py
around lines 67-68, the encoder currently calls model_dump_json() which returns
a JSON string; replace it with model_dump() so the encoder returns a Python
object (dict/list) as expected by JSONEncoder.default(), avoiding
double-encoding when json.dumps() is invoked later (see uses around lines 81 and
90).

elif hasattr(o, "json"):
return o.json()
Comment on lines 69 to 70
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Review the .json() method for potential double-encoding.

The .json() method (Pydantic v1/v2) also returns a JSON string, not a dict. This has the same double-encoding issue as model_dump_json(). Consider using .dict() (Pydantic v1) or checking which method is appropriate for the object type.

For Pydantic models specifically:

  • v1: use .dict() instead of .json()
  • v2: use .model_dump() instead of .model_dump_json()
🔎 Suggested approach for backward compatibility
-        if hasattr(o, "model_dump_json"):
-            return o.model_dump_json()
-        elif hasattr(o, "json"):
-            return o.json()
+        if hasattr(o, "model_dump"):
+            return o.model_dump()
+        elif hasattr(o, "dict"):
+            return o.dict()
         elif hasattr(o, "to_json"):
             return o.to_json()

Note: Keep the to_json() branch for non-Pydantic objects that have custom serialization.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py
around lines 69-70, the current branch calling o.json() can produce a JSON
string and cause double-encoding; update the logic to detect Pydantic models and
prefer .dict() for v1 or .model_dump() for v2 (e.g., check for attributes like
model_dump/model_dump_json or dict/json and call the non-JSON-returning method),
falling back to the existing to_json() or .json() only for non-Pydantic objects
that intentionally return a JSON string; ensure the branch checks the right
attributes in order (model_dump -> dict -> to_json -> json) and returns a dict
rather than a JSON string.

elif hasattr(o, "to_json"):
Expand Down
2 changes: 1 addition & 1 deletion packages/traceloop-sdk/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ opentelemetry-instrumentation-groq = { path = "../opentelemetry-instrumentation-
opentelemetry-instrumentation-mcp = { path = "../opentelemetry-instrumentation-mcp", develop = true }
colorama = "^0.4.6"
tenacity = ">=8.2.3, <10.0"
pydantic = ">=1"
pydantic = ">=2"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Read the pyproject.toml around the specified lines
sed -n '65,95p' packages/traceloop-sdk/pyproject.toml | cat -n

Repository: traceloop/openllmetry

Length of output: 1275


Align pydantic dependency constraint with test requirements.

The main dependency pydantic = ">=2" allows v3.x, but test dependencies pin to <3, creating version constraint inconsistency. This means production could install Pydantic v3 while tests only validate v2.x compatibility.

Additionally, bumping to Pydantic v2 introduces breaking API changes including deprecated parse_raw/parse_file and from_orm methods. Update the main dependency to pydantic = ">=2,<3" to match test constraints, and ensure release notes document this as a breaking change for Pydantic v1 users.

jinja2 = "^3.1.5"
deprecated = "^1.2.14"
aiohttp = "^3.11.11"
Expand Down
31 changes: 31 additions & 0 deletions packages/traceloop-sdk/tests/test_json_encoder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from pathlib import Path
import pytest
from pydantic import BaseModel

from traceloop.sdk.decorators import task
from opentelemetry.semconv_ai import SpanAttributes


def test_json_encoder_task(exporter, recwarn):

class TestValue(BaseModel):
value: int

@task(name="test_task")
def test_method(a: TestValue, b: TestValue):
return TestValue(value=a.value + b.value)

result = test_method(TestValue(value=2), TestValue(value=3))

assert result.value == 5

spans = exporter.get_finished_spans()
assert len(spans) == 1
span = spans[0]
assert span.attributes[SpanAttributes.TRACELOOP_ENTITY_INPUT] == r'{"args": ["{\"value\":2}", "{\"value\":3}"], "kwargs": {}}'
assert span.attributes[SpanAttributes.TRACELOOP_ENTITY_OUTPUT] == r'"{\"value\":5}"'

for warning in recwarn:
file = Path(warning.filename)
if file.name == "json_encoder.py" and "`json` method is deprecated" in str(warning.message):
pytest.fail(f"Deprecation warning found: {warning.message}")
7 changes: 4 additions & 3 deletions packages/traceloop-sdk/traceloop/sdk/prompts/model.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import datetime
from typing import List, Literal, Optional, Union

from pydantic import BaseModel, Field
from pydantic import BaseModel, Field, ConfigDict
from typing_extensions import Annotated


Expand All @@ -10,8 +10,9 @@ class TemplateEngine:


class RegistryObjectBaseModel(BaseModel):
class Config:
arbitrary_types_allowed = True
model_config = ConfigDict(
arbitrary_types_allowed=True
)


class TextContent(RegistryObjectBaseModel):
Expand Down
3 changes: 3 additions & 0 deletions packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have this method in other packages as well - can you update them as well? Also, can you bump pydantic so the test will actually work with the right version?

Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ def default(self, o):
if hasattr(o, "to_json"):
return o.to_json()

if hasattr(o, "model_dump_json"):
return o.model_dump_json()

if hasattr(o, "json"):
return o.json()

Expand Down