Port to `orjson` from `ujson` #8584

aadya940 · 2025-07-29T09:43:15Z

As specified in #8540 , ujson is in maintanance mode.

kurtmckee · 2025-08-04T16:05:34Z

dspy/clients/databricks.py

@@ -318,7 +318,7 @@ def _save_data_to_local_file(train_data: list[dict[str, Any]], data_format: Trai
            elif data_format == TrainDataFormat.COMPLETION:
                _validate_completion_data(item)

-            f.write(ujson.dumps(item) + "\n")
+            f.write(orjson.dumps(item).decode() + "\n")


I recommend changing the open(file_path, "w"), above, to "wb" mode to eliminate the decodes happening here.

The end result will be:

Suggested change

f.write(orjson.dumps(item).decode() + "\n")

f.write(orjson.dumps(item) + b"\n")

decode() is beneficial here - one nice benefit of saving state as json is the file is human readable, so that users can copy/paste the demos/instructions to other parts of the pipeline. I have seen many users only use DSPy as the prompt optimization tool, and export the optimized prompt (instruction + demos).

.decode() is not doing the beneficial work that you think it's doing.

It's creating this pipeline:

with open(file_path, "w") as f:

Opening the file in text mode means that Python must internally encode the file when it actually writes the file. Python does NOT use UTF-8 for this! It defaults to the system-specific encoding. On Windows in the U.S., for example, this is usually something like ISO-8859-1.

orjson.dumps(item)

This produces a bytes object -- perfect for writing to disk!

orjson.dumps(item).decode()

This decodes the bytes to UTF-8, and is a performance problem because now Python must re-encode the content before it can be written to disk. What a waste.

orjson.dumps(item).decode() + "\n"

Another performance problem -- the entire JSON string must now be copied in memory just so a newline can be added to the end in memory.

f.write(orjson.dumps(item).decode() + "\n")

Here's the full pipeline in this one line of code:

orjson produces a UTF-8-encoded bytes instance.

The bytes are decoded using UTF-8.

A new string is constructed in memory, with a newline added to the end.

Python re-encodes the string to the default system character set, which varies from one system to the next.

Python writes the bytes instance it just created to disk.

The .decode() is not beneficial. It's a waste of cycles.

The most performant option that I'm aware of, which eliminates the decode-with-UTF-8/re-encode-with-??? roundtrip, and eliminates the construct-a-new-string-in-memory is:

with open(file_path, "wb") as f: f.write(orjson.dumps(item)) f.write(b"\n")

Yes I am aware of the performance issue. I was worrying about the displayed text when opening the json file is not readable, but it's not really the case since most text editors decode by utf-8 by default.

kurtmckee · 2025-08-04T16:06:28Z

dspy/clients/utils_finetune.py

    with open(file_path, "w") as f:
        for item in data:
-            f.write(ujson.dumps(item) + "\n")
+            f.write(orjson.dumps(item).decode() + "\n")


As above, I recommend eliminating the calls to .decode():

with open(file_path, "wb") as f: for item in data: f.write(orjson.dumps(item) + b"\n")

This suggestion applies to the other change in this file.

kurtmckee · 2025-08-04T16:09:47Z

dspy/primitives/base_module.py

            with open(path, encoding="utf-8") as f:
-                state = ujson.loads(f.read())
+                state = orjson.loads(f.read().encode('utf-8'))


Last place I'll leave feedback, but in general the suggestion is "don't decode content only to re-encode it immediately:

with open(path, "rb") as f: state = orjson.loads(f.read())

Also, these are pathlib objects so this is more ideal:

state = orjson.loads(path.read_bytes())

okhat · 2025-08-10T20:23:37Z

Thanks so much @aadya940 and thanks @kurtmckee for the comments!

@aadya940 Mind addressing the failures (ruff mostly? or maybe some tests?) and checking out the comments (I didn't dive into them)

…ordnlp#8584

chenmoneygithub · 2025-08-14T08:12:15Z

@aadya940 Thanks for the PR! the most concerning part is if switching to orjson can still load a saved program before this PR. I am running some testing.

port to orjson from ujson

849abe7

kurtmckee reviewed Aug 4, 2025

View reviewed changes

0xEval pushed a commit to 0xEval/dspy-orjson-mig that referenced this pull request Aug 13, 2025

feat: migrate deprecated ujson to orjson for pyodide support PR stanf…

c12dace

…ordnlp#8584

chenmoneygithub mentioned this pull request Aug 14, 2025

Replace ujson by orjson #8655

Open

okhat closed this Aug 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port to `orjson` from `ujson` #8584

Port to `orjson` from `ujson` #8584

Uh oh!

aadya940 commented Jul 29, 2025

Uh oh!

kurtmckee Aug 4, 2025

Uh oh!

chenmoneygithub Aug 14, 2025

Uh oh!

kurtmckee Aug 14, 2025

Uh oh!

chenmoneygithub Aug 14, 2025

Uh oh!

kurtmckee Aug 4, 2025

Uh oh!

kurtmckee Aug 4, 2025

Uh oh!

okhat commented Aug 10, 2025

Uh oh!

chenmoneygithub commented Aug 14, 2025

Uh oh!

Uh oh!

	f.write(orjson.dumps(item).decode() + "\n")
	f.write(orjson.dumps(item) + b"\n")

Port to orjson from ujson #8584

Port to orjson from ujson #8584

Uh oh!

Conversation

aadya940 commented Jul 29, 2025

Uh oh!

kurtmckee Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

kurtmckee Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

kurtmckee Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

kurtmckee Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

okhat commented Aug 10, 2025

Uh oh!

chenmoneygithub commented Aug 14, 2025

Uh oh!

Uh oh!

Port to `orjson` from `ujson` #8584

Port to `orjson` from `ujson` #8584