Skip to content

Comments

Clarify UTF encoding between C# strings and Godot Strings#10920

Merged
skyace65 merged 3 commits intogodotengine:masterfrom
Athenr:master
Jul 18, 2025
Merged

Clarify UTF encoding between C# strings and Godot Strings#10920
skyace65 merged 3 commits intogodotengine:masterfrom
Athenr:master

Conversation

@Athenr
Copy link
Contributor

@Athenr Athenr commented May 2, 2025

Fix #7682

Updated c_sharp_differences.rst to include the difference between the UTF encoding for C# strings and Godot Strings.

Athenr added 2 commits May 2, 2025 16:14
…coding

Clarified that C# System.String uses UTF-16 encoding while Godot String uses UTF-32.
…dot-String-UTF-encoding

Update c_sharp_differences.rst with C# string and Godot String UTF encoding
@skyace65 skyace65 added enhancement topic:dotnet area:manual Issues and PRs related to the Manual/Tutorials section of the documentation labels May 3, 2025
@skyace65 skyace65 requested a review from a team May 3, 2025 01:13
Revising for grammatical fixes in changes.

Co-authored-by: A Thousand Ships <96648715+AThousandShips@users.noreply.github.com>
Copy link
Member

@raulsntos raulsntos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing to the C# documentation. I think we had in mind something more extensive that clarifies why this can be a problem.

Normally, the encoding is not a problem since we convert between C# strings and Godot strings automatically, so ideally users wouldn't even need to think about it. So mentioning the encoding difference would not be important if that's all there is to say.

We wanted to add a note about this in the documentation because of the problems that it may cause in some APIs. The example given in the discussion from #7612 was TextServer::string_get_word_breaks. This API breaks the text into words and returns an array of character indices, but these indices will be wrong for C# strings in some cases.

For example, for the string "ℌ𝔢𝔩𝔩𝔬 𝔚𝔬𝔯𝔩𝔡" the returned array would be [0, 5, 6, 11]. But those indices don't correspond in the C# string because the characters may take more than a single UTF-16 character. In C# the indices should be [0, 9, 10, 20] or you should use System.Rune instead.

@skyace65 skyace65 merged commit 768aa53 into godotengine:master Jul 18, 2025
1 check passed
@skyace65
Copy link
Contributor

Thanks! And congrats on your first merged PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:manual Issues and PRs related to the Manual/Tutorials section of the documentation cherrypick:4.4 enhancement topic:dotnet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document that C# strings are UTF16 while Godot Strings are UTF32

5 participants