fix(sdk): Updates to support Pydantic v3 #3517

ryanamannion · 2025-12-12T16:16:21Z

Addresses Issue #3516

Continuing the example from the above Issue, the following is the appearance of a trace with the changes from this PR.

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Important

Updates JSONEncoder to support Pydantic v3 by adding model_dump_json method handling and adds tests for verification.

Behavior:
- Updates JSONEncoder in json_encoder.py to support Pydantic v3 by adding model_dump_json method handling.
- Ensures backward compatibility by retaining json method handling.
Tests:
- Adds test_json_encoder_task in test_json_encoder.py to verify JSONEncoder with Pydantic models.
- Checks for deprecation warnings related to json method in test_json_encoder.py.

^{This description was created by}^{for 165cee0. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- JSON encoder now supports additional object serialization formats, improving compatibility with newer model types.
Tests
- Added tests covering JSON encoder behavior, OpenTelemetry span attributes, and deprecation-warning checks.
Chores
- Updated code to be compatible with Pydantic v2 (model configuration adjusted).

_{✏️ Tip: You can customize this high-level summary in your review settings.}

… method of pydantic models

CLAassistant · 2025-12-12T16:16:29Z

All committers have signed the CLA.

coderabbitai · 2025-12-12T16:16:36Z

Walkthrough

Adds support for serializing objects that expose a Pydantic v2 model_dump_json() method across JSON encoders, adds a test verifying encoder behavior and emitted span attributes, updates a Pydantic dependency bound to >=2, and replaces an inner Pydantic Config with model_config = ConfigDict(...).

Changes

Cohort / File(s)	Summary
JSON encoder updates `packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py`, `packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py`, `packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py`	Each JSON encoder's `default()` now checks for and returns `o.model_dump_json()` when present (inserted before existing `.json` handling) to support Pydantic v2 model serialization.
New test `packages/traceloop-sdk/tests/test_json_encoder.py`	Adds a test that defines a Pydantic model and a decorated task, executes it, asserts returned value and OpenTelemetry span attributes for input/output serialization, and fails if a deprecation warning about `.json()` is emitted.
Pydantic dependency bump `packages/traceloop-sdk/pyproject.toml`	Changes Pydantic version constraint from `>=1` to `>=2`.
Pydantic model config update `packages/traceloop-sdk/traceloop/sdk/prompts/model.py`	Replaces an inner `Config` class using `arbitrary_types_allowed = True` with `model_config = ConfigDict(arbitrary_types_allowed=True)` and adds `ConfigDict` to imports.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Pay attention to compatibility implications of switching to model_dump_json() vs .json() across multiple encoder implementations.
Verify tests correctly capture OpenTelemetry span attributes and that warning checks reliably detect deprecated .json() usage.
Review pyproject.toml change for downstream dependency impacts and ensure other code using Pydantic is updated for v2 semantics.
Confirm model_config = ConfigDict(...) usage matches project-wide Pydantic v2 patterns.

Poem

🐰 I hopped through fields of code tonight,
Found model_dump_json shining bright,
Encoders pranced, the tests did clap,
Spans recorded every map.
No warnings now — a carrot snack! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of the changeset: updating the SDK to support Pydantic v3 by adding model_dump_json handling and updating configuration patterns.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 165cee0 in 2 minutes and 9 seconds. Click for details.

Reviewed 51 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/tests/test_json_encoder.py:25

Draft comment:
Consider using json.loads for comparison instead of raw string equality to avoid issues with escaping differences.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

2. packages/traceloop-sdk/tests/test_json_encoder.py:28

Draft comment:
Using recwarn loop to check deprecation warnings works, though consider a more direct pytest.warns approach for clarity.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50% None

3. packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py:17

Draft comment:
Check that 'model_dump_json' is callable before invoking it to prevent potential attribute errors.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% The comment suggests adding a callable check, but the existing code already uses the same pattern for to_json() and json() without checking if they're callable. This is a consistent pattern in the file. If model_dump_json exists as a non-callable attribute, it would raise an error, but the same would be true for the other methods. The comment is asking for a defensive check that isn't applied to the existing similar code. This seems like an overly cautious suggestion that would be inconsistent with the existing code style. Additionally, model_dump_json is a well-known Pydantic v2 method that is always callable when it exists, so this check is unnecessary in practice. Perhaps there's a specific reason to be more defensive with model_dump_json compared to the other methods? Maybe there's been a historical issue with this pattern that I'm not aware of? The comment could be highlighting a broader issue that affects all three method calls. Even if there were historical issues, the comment only targets the new code and not the existing similar patterns. If this were a real concern, it should apply to all three method checks. The comment is inconsistent with the established pattern in the file and doesn't provide evidence of why this specific addition needs different treatment. This comment should be deleted. It suggests a defensive check that is inconsistent with the existing code pattern used for to_json() and json(). The comment doesn't provide justification for why model_dump_json specifically needs this check when the other methods don't. It's an overly cautious suggestion without strong evidence.

Workflow ID: wflow_XSrwMnzjHgOpFQqV

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

nirga

Thanks @ryanamannion! Left 2 comments :)

nirga · 2025-12-16T19:49:42Z

packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py

we have this method in other packages as well - can you update them as well? Also, can you bump pydantic so the test will actually work with the right version?

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 165cee0 and 30ee5ad.

📒 Files selected for processing (4)

packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py
packages/traceloop-sdk/pyproject.toml
packages/traceloop-sdk/traceloop/sdk/prompts/model.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/traceloop-sdk/traceloop/sdk/prompts/model.py
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py
packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py

🧠 Learnings (1)

📚 Learning: 2025-09-18T14:36:24.693Z

Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.

Applied to files:

packages/traceloop-sdk/pyproject.toml

🔇 Additional comments (2)

packages/traceloop-sdk/traceloop/sdk/prompts/model.py (2)

4-4: LGTM: Correct import for Pydantic v2.

Adding ConfigDict to the imports is the correct approach for Pydantic v2 model configuration.

13-15: LGTM: Proper Pydantic v2 configuration pattern.

This correctly migrates from the deprecated inner Config class to the Pydantic v2 model_config attribute with ConfigDict. The arbitrary_types_allowed=True setting is properly preserved.

coderabbitai · 2025-12-22T16:05:36Z

packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py

+        if hasattr(o, "model_dump_json"):
+            return o.model_dump_json()


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py | head -100

Repository: traceloop/openllmetry

Length of output: 3645

Replace model_dump_json() with model_dump().

The model_dump_json() method returns a JSON string, but JSONEncoder.default() expects a Python object (dict, list, etc.). Returning a JSON string causes it to be escaped and encoded again by json.dumps(), resulting in malformed JSON data in span attributes.

Use model_dump() instead, which returns a dictionary:

- if hasattr(o, "model_dump_json"): - return o.model_dump_json() + if hasattr(o, "model_dump"): + return o.model_dump()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if hasattr(o, "model_dump_json"):

return o.model_dump_json()

if hasattr(o, "model_dump"):

return o.model_dump()

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py around lines 18-19, the code calls model_dump_json() which returns a JSON string and causes double-encoding when JSONEncoder.default() hands the value to json.dumps(); change the call to model_dump() so a Python dict/list is returned instead (and ensure the returned object is JSON-serializable before returning).

This is actually a good point that potentially applies to the original change in sdk as well.

When we call model_dump_json we are returned a string representation of that class. If we instead call o.model_dump(mode='json') we are returned a python dictionary where each value is ensured to be a JSON-dumpable type. (See the documentation here).

Here is an example of what calling o.model_dump(mode='json') instead of o.model_dump_json() looks like in Traceloop:

Compare this with the screenshot in the initial PR comment, or here in the span listing for a before/ after of the original implementation vs. calling o.model_dump(mode='json')

IMO model_dump is preferable

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check all three JSONEncoder files for model_dump_json usage echo "=== traceloop-sdk/utils/json_encoder.py ===" rg -n "model_dump_json" packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py -C 2 echo -e "\n=== haystack/utils.py ===" rg -n "model_dump_json" packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py -C 2 echo -e "\n=== llamaindex/utils.py ===" rg -n "model_dump_json" packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py -C 2

Length of output: 702

@ryanamannion, thank you for the detailed analysis and the visual evidence! You're absolutely right that model_dump(mode='json') is the better approach here. The screenshots clearly show the improvement—avoiding the double-encoding issue where the JSON string gets escaped again.

All three JSON encoder files currently use model_dump_json() and should be updated consistently to use model_dump(mode='json'):

packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py (lines 17-18)

packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py (lines 18-19)

packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py (lines 67-68)

The suggested change for each file:

if hasattr(o, "model_dump"): return o.model_dump(mode='json')

This ensures that:

A Python dictionary is returned (not a JSON string)

All values in the dictionary are JSON-serializable

The JSON encoder can properly encode the structure without double-encoding

The mode='json' parameter is crucial as it handles types like datetime, UUID, etc., converting them to JSON-compatible representations.

coderabbitai · 2025-12-22T16:05:36Z

...s/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py

+        if hasattr(o, "model_dump_json"):
+            return o.model_dump_json()


⚠️ Potential issue | 🔴 Critical

Critical: Use model_dump() instead of model_dump_json().

Same issue as in the haystack utils file: model_dump_json() returns a JSON string, but JSONEncoder.default() expects a Python object. This causes double-encoding when json.dumps() is called with this encoder (lines 81, 90).

🔎 Proposed fix

- if hasattr(o, "model_dump_json"): - return o.model_dump_json() + if hasattr(o, "model_dump"): + return o.model_dump()

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py around lines 67-68, the encoder currently calls model_dump_json() which returns a JSON string; replace it with model_dump() so the encoder returns a Python object (dict/list) as expected by JSONEncoder.default(), avoiding double-encoding when json.dumps() is invoked later (see uses around lines 81 and 90).

coderabbitai · 2025-12-22T16:05:36Z

...s/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py

        elif hasattr(o, "json"):
            return o.json()


⚠️ Potential issue | 🔴 Critical

Review the .json() method for potential double-encoding.

The .json() method (Pydantic v1/v2) also returns a JSON string, not a dict. This has the same double-encoding issue as model_dump_json(). Consider using .dict() (Pydantic v1) or checking which method is appropriate for the object type.

For Pydantic models specifically:

v1: use .dict() instead of .json()

v2: use .model_dump() instead of .model_dump_json()

🔎 Suggested approach for backward compatibility

- if hasattr(o, "model_dump_json"): - return o.model_dump_json() - elif hasattr(o, "json"): - return o.json() + if hasattr(o, "model_dump"): + return o.model_dump() + elif hasattr(o, "dict"): + return o.dict() elif hasattr(o, "to_json"): return o.to_json()

Note: Keep the to_json() branch for non-Pydantic objects that have custom serialization.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py around lines 69-70, the current branch calling o.json() can produce a JSON string and cause double-encoding; update the logic to detect Pydantic models and prefer .dict() for v1 or .model_dump() for v2 (e.g., check for attributes like model_dump/model_dump_json or dict/json and call the non-JSON-returning method), falling back to the existing to_json() or .json() only for non-Pydantic objects that intentionally return a JSON string; ensure the branch checks the right attributes in order (model_dump -> dict -> to_json -> json) and returns a dict rather than a JSON string.

coderabbitai · 2025-12-22T16:05:36Z

packages/traceloop-sdk/pyproject.toml

 colorama = "^0.4.6"
 tenacity = ">=8.2.3, <10.0"
-pydantic = ">=1"
+pydantic = ">=2"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Read the pyproject.toml around the specified lines sed -n '65,95p' packages/traceloop-sdk/pyproject.toml | cat -n

Repository: traceloop/openllmetry

Length of output: 1275

Align pydantic dependency constraint with test requirements.

The main dependency pydantic = ">=2" allows v3.x, but test dependencies pin to <3, creating version constraint inconsistency. This means production could install Pydantic v3 while tests only validate v2.x compatibility.

Additionally, bumping to Pydantic v2 introduces breaking API changes including deprecated parse_raw/parse_file and from_orm methods. Update the main dependency to pydantic = ">=2,<3" to match test constraints, and ensure release notes document this as a breaking change for Pydantic v1 users.

ryanamannion · 2025-12-22T16:07:42Z

Thanks for the feedback.

I went ahead and added a catch for model_dump_json to a few JSON-encoder-type classes that were missing one. I did find that a number of instrumentation packages already have support for this, namely OpenAI, Anthropic, cohere, and langchain.

Implementations seem to vary. Some check if hasattr(obj, "model_dump_json") and callable(obj.model_dump_json): while others check if isinstance(o, BaseModel) and hasattr(o, "model_dump_json"):. I suppose there is an off chance that you might have an object that defines a "model_dump_json" method that is not callable? This seems unlikely to me, but I can guard against it if you prefer.

I bumped pydantic to >=2 as requested, to make sure that the tests run with the right version. I'm wondering if this should be done at the test level, to test that the changes work with multiple versions of pydantic. The changes as implemented should continue to support pydantic v1.

While we are here, I also changed configuration of a BaseModel instance in sdk/prompts to use the v2 method of configuration via ConfigDict. This also raised warnings in my testing.

EDIT: it seems I have a few critical issues from CodeRabbit. I will take a look at these.

ryanamannion added 2 commits December 12, 2025 10:35

fix(sdk): Update JSONEncoder to call non-deprecated model_dump_json…

cc67f60

… method of pydantic models

fix(sdk): add test

165cee0

ellipsis-dev bot reviewed Dec 12, 2025

View reviewed changes

nirga reviewed Dec 16, 2025

View reviewed changes

ryanamannion added 4 commits December 22, 2025 10:56

fix(haystack): update EnhancedJSONEncoder to use model_dump_json

eb5cdef

fix(llamaindex): update JSONEncoder to use model_dump_json

d3e92be

fix(sdk): bump pydantic to >=2

bfafda5

fix(sdk): use pydantic v2 ConfigDict configuration of BaseModel

30ee5ad

coderabbitai bot reviewed Dec 22, 2025

View reviewed changes

ryanamannion mentioned this pull request Dec 22, 2025

🐛 Bug Report: JSONEncoder calling deprecated method json for Pydantic models #3516

Open

1 task

fix(sdk): Updates to support Pydantic v3 #3517

Are you sure you want to change the base?

fix(sdk): Updates to support Pydantic v3 #3517

Uh oh!

Conversation

ryanamannion commented Dec 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

CLAassistant commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

nirga Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanamannion Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

ryanamannion commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ryanamannion commented Dec 12, 2025 •

edited by coderabbitai bot

Loading

CLAassistant commented Dec 12, 2025 •

edited

Loading

coderabbitai bot commented Dec 12, 2025 •

edited

Loading

coderabbitai bot Dec 22, 2025 •

edited

Loading

ryanamannion commented Dec 22, 2025 •

edited

Loading