-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Certain file systems allow for characters that either have a special meaning in Unicode such as U+d800 and/or non-Unicode characters
The extended bodyfile 3 format currently does not specify how to handle these characters. Proposal is to escape such characters as "\u####" and "\U########", preferring the short form over the long form where possible.
- Control characters U+1-U+8, U+B-U+C, U+E-U+1F, U+7F-U+84, U+86-U+9F (already covered)
- Unicode surrogate characters U+d800-U+dfff - Changes to escape Unicode surrogate codes #78
- Undefined Unicode characters - Changes to handle special and non-Unicode characters #77 #95
- U+FDD0-U+FDDF
- U+fffe-U+ffff
- U+1FFFE-U+1FFFF
- U+2FFFE-U+2FFFF
- U+3FFFE-U+3FFFF
- U+4FFFE-U+4FFFF
- U+5FFFE-U+5FFFF
- U+6FFFE-U+6FFFF
- U+7FFFE-U+7FFFF
- U+8FFFE-U+8FFFF
- U+9FFFE-U+9FFFF
- U+AFFFE-U+AFFFF
- U+BFFFE-U+BFFFF
- U+CFFFE-U+CFFFF
- U+DFFFE-U+DFFFF
- U+EFFFE-U+EFFFF
- U+FFFFE-U+FFFFF
- U+10FFFE-U+10FFFF
- Other values observed to be not printable - Changes to handle special and non-Unicode characters #77 #95
- U+2028, U+2029, U+E000, U+F8FF, U+F0000, U+FFFFD, U+100000, U+10FFFD
Open questions
- What about "Unicode compatibility characters" ?
- What about U+110000-U+ffffffff
- What about original path uses a specific codepage (encoding), which is converted to Unicode, however that can be encoded into multiple variations of the original encoding e.g. encoding U+2252 to cp932. What if there are 2 paths that decode to the same string? How should the original path be best preserved?
- filename contains a path segment separator (e.g. \ or /), if not escaped this leads to ambiguity e.g. if / is a path segment separator is 'test/1234' a single file name or a path ?
A related discussion dfxml-working-group/dfxml_schema#34
Also consider if the format should be extended with a header to specify its encoding?
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request