-
Notifications
You must be signed in to change notification settings - Fork 853
fix(sdk): Updates to support Pydantic v3 #3517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(sdk): Updates to support Pydantic v3 #3517
Conversation
… method of pydantic models
WalkthroughAdds support for serializing objects that expose a Pydantic v2 Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 165cee0 in 2 minutes and 9 seconds. Click for details.
- Reviewed
51lines of code in2files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/tests/test_json_encoder.py:25
- Draft comment:
Consider using json.loads for comparison instead of raw string equality to avoid issues with escaping differences. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
2. packages/traceloop-sdk/tests/test_json_encoder.py:28
- Draft comment:
Using recwarn loop to check deprecation warnings works, though consider a more direct pytest.warns approach for clarity. - Reason this comment was not posted:
Confidence changes required:30%<= threshold50%None
3. packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py:17
- Draft comment:
Check that 'model_dump_json' is callable before invoking it to prevent potential attribute errors. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% The comment suggests adding a callable check, but the existing code already uses the same pattern forto_json()andjson()without checking if they're callable. This is a consistent pattern in the file. Ifmodel_dump_jsonexists as a non-callable attribute, it would raise an error, but the same would be true for the other methods. The comment is asking for a defensive check that isn't applied to the existing similar code. This seems like an overly cautious suggestion that would be inconsistent with the existing code style. Additionally,model_dump_jsonis a well-known Pydantic v2 method that is always callable when it exists, so this check is unnecessary in practice. Perhaps there's a specific reason to be more defensive withmodel_dump_jsoncompared to the other methods? Maybe there's been a historical issue with this pattern that I'm not aware of? The comment could be highlighting a broader issue that affects all three method calls. Even if there were historical issues, the comment only targets the new code and not the existing similar patterns. If this were a real concern, it should apply to all three method checks. The comment is inconsistent with the established pattern in the file and doesn't provide evidence of why this specific addition needs different treatment. This comment should be deleted. It suggests a defensive check that is inconsistent with the existing code pattern used forto_json()andjson(). The comment doesn't provide justification for whymodel_dump_jsonspecifically needs this check when the other methods don't. It's an overly cautious suggestion without strong evidence.
Workflow ID: wflow_XSrwMnzjHgOpFQqV
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
nirga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ryanamannion! Left 2 comments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have this method in other packages as well - can you update them as well? Also, can you bump pydantic so the test will actually work with the right version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.pypackages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.pypackages/traceloop-sdk/pyproject.tomlpackages/traceloop-sdk/traceloop/sdk/prompts/model.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/prompts/model.pypackages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.pypackages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
Applied to files:
packages/traceloop-sdk/pyproject.toml
🔇 Additional comments (2)
packages/traceloop-sdk/traceloop/sdk/prompts/model.py (2)
4-4: LGTM: Correct import for Pydantic v2.Adding
ConfigDictto the imports is the correct approach for Pydantic v2 model configuration.
13-15: LGTM: Proper Pydantic v2 configuration pattern.This correctly migrates from the deprecated inner
Configclass to the Pydantic v2model_configattribute withConfigDict. Thearbitrary_types_allowed=Truesetting is properly preserved.
| if hasattr(o, "model_dump_json"): | ||
| return o.model_dump_json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py | head -100Repository: traceloop/openllmetry
Length of output: 3645
Replace model_dump_json() with model_dump().
The model_dump_json() method returns a JSON string, but JSONEncoder.default() expects a Python object (dict, list, etc.). Returning a JSON string causes it to be escaped and encoded again by json.dumps(), resulting in malformed JSON data in span attributes.
Use model_dump() instead, which returns a dictionary:
- if hasattr(o, "model_dump_json"):
- return o.model_dump_json()
+ if hasattr(o, "model_dump"):
+ return o.model_dump()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if hasattr(o, "model_dump_json"): | |
| return o.model_dump_json() | |
| if hasattr(o, "model_dump"): | |
| return o.model_dump() |
🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py
around lines 18-19, the code calls model_dump_json() which returns a JSON string
and causes double-encoding when JSONEncoder.default() hands the value to
json.dumps(); change the call to model_dump() so a Python dict/list is returned
instead (and ensure the returned object is JSON-serializable before returning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a good point that potentially applies to the original change in sdk as well.
When we call model_dump_json we are returned a string representation of that class. If we instead call o.model_dump(mode='json') we are returned a python dictionary where each value is ensured to be a JSON-dumpable type. (See the documentation here).
Here is an example of what calling o.model_dump(mode='json') instead of o.model_dump_json() looks like in Traceloop:

Compare this with the screenshot in the initial PR comment, or here in the span listing for a before/ after of the original implementation vs. calling o.model_dump(mode='json')

IMO model_dump is preferable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check all three JSONEncoder files for model_dump_json usage
echo "=== traceloop-sdk/utils/json_encoder.py ==="
rg -n "model_dump_json" packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py -C 2
echo -e "\n=== haystack/utils.py ==="
rg -n "model_dump_json" packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py -C 2
echo -e "\n=== llamaindex/utils.py ==="
rg -n "model_dump_json" packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py -C 2
Length of output: 702
@ryanamannion, thank you for the detailed analysis and the visual evidence! You're absolutely right that model_dump(mode='json') is the better approach here. The screenshots clearly show the improvement—avoiding the double-encoding issue where the JSON string gets escaped again.
All three JSON encoder files currently use model_dump_json() and should be updated consistently to use model_dump(mode='json'):
- packages/traceloop-sdk/traceloop/sdk/utils/json_encoder.py (lines 17-18)
- packages/opentelemetry-instrumentation-haystack/opentelemetry/instrumentation/haystack/utils.py (lines 18-19)
- packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py (lines 67-68)
The suggested change for each file:
if hasattr(o, "model_dump"):
return o.model_dump(mode='json')This ensures that:
- A Python dictionary is returned (not a JSON string)
- All values in the dictionary are JSON-serializable
- The JSON encoder can properly encode the structure without double-encoding
The mode='json' parameter is crucial as it handles types like datetime, UUID, etc., converting them to JSON-compatible representations.
| if hasattr(o, "model_dump_json"): | ||
| return o.model_dump_json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Use model_dump() instead of model_dump_json().
Same issue as in the haystack utils file: model_dump_json() returns a JSON string, but JSONEncoder.default() expects a Python object. This causes double-encoding when json.dumps() is called with this encoder (lines 81, 90).
🔎 Proposed fix
- if hasattr(o, "model_dump_json"):
- return o.model_dump_json()
+ if hasattr(o, "model_dump"):
+ return o.model_dump()🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py
around lines 67-68, the encoder currently calls model_dump_json() which returns
a JSON string; replace it with model_dump() so the encoder returns a Python
object (dict/list) as expected by JSONEncoder.default(), avoiding
double-encoding when json.dumps() is invoked later (see uses around lines 81 and
90).
| elif hasattr(o, "json"): | ||
| return o.json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review the .json() method for potential double-encoding.
The .json() method (Pydantic v1/v2) also returns a JSON string, not a dict. This has the same double-encoding issue as model_dump_json(). Consider using .dict() (Pydantic v1) or checking which method is appropriate for the object type.
For Pydantic models specifically:
- v1: use
.dict()instead of.json() - v2: use
.model_dump()instead of.model_dump_json()
🔎 Suggested approach for backward compatibility
- if hasattr(o, "model_dump_json"):
- return o.model_dump_json()
- elif hasattr(o, "json"):
- return o.json()
+ if hasattr(o, "model_dump"):
+ return o.model_dump()
+ elif hasattr(o, "dict"):
+ return o.dict()
elif hasattr(o, "to_json"):
return o.to_json()Note: Keep the to_json() branch for non-Pydantic objects that have custom serialization.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-llamaindex/opentelemetry/instrumentation/llamaindex/utils.py
around lines 69-70, the current branch calling o.json() can produce a JSON
string and cause double-encoding; update the logic to detect Pydantic models and
prefer .dict() for v1 or .model_dump() for v2 (e.g., check for attributes like
model_dump/model_dump_json or dict/json and call the non-JSON-returning method),
falling back to the existing to_json() or .json() only for non-Pydantic objects
that intentionally return a JSON string; ensure the branch checks the right
attributes in order (model_dump -> dict -> to_json -> json) and returns a dict
rather than a JSON string.
| colorama = "^0.4.6" | ||
| tenacity = ">=8.2.3, <10.0" | ||
| pydantic = ">=1" | ||
| pydantic = ">=2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Read the pyproject.toml around the specified lines
sed -n '65,95p' packages/traceloop-sdk/pyproject.toml | cat -nRepository: traceloop/openllmetry
Length of output: 1275
Align pydantic dependency constraint with test requirements.
The main dependency pydantic = ">=2" allows v3.x, but test dependencies pin to <3, creating version constraint inconsistency. This means production could install Pydantic v3 while tests only validate v2.x compatibility.
Additionally, bumping to Pydantic v2 introduces breaking API changes including deprecated parse_raw/parse_file and from_orm methods. Update the main dependency to pydantic = ">=2,<3" to match test constraints, and ensure release notes document this as a breaking change for Pydantic v1 users.
|
Thanks for the feedback. I went ahead and added a catch for Implementations seem to vary. Some check I bumped pydantic to >=2 as requested, to make sure that the tests run with the right version. I'm wondering if this should be done at the test level, to test that the changes work with multiple versions of pydantic. The changes as implemented should continue to support pydantic v1. While we are here, I also changed configuration of a BaseModel instance in sdk/prompts to use the v2 method of configuration via EDIT: it seems I have a few critical issues from CodeRabbit. I will take a look at these. |
Addresses Issue #3516
Continuing the example from the above Issue, the following is the appearance of a trace with the changes from this PR.

feat(instrumentation): ...orfix(instrumentation): ....Important
Updates
JSONEncoderto support Pydantic v3 by addingmodel_dump_jsonmethod handling and adds tests for verification.JSONEncoderinjson_encoder.pyto support Pydantic v3 by addingmodel_dump_jsonmethod handling.jsonmethod handling.test_json_encoder_taskintest_json_encoder.pyto verifyJSONEncoderwith Pydantic models.jsonmethod intest_json_encoder.py.This description was created by
for 165cee0. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.