Skip to content

XML string handling is broken #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aduskett opened this issue Jan 25, 2024 · 6 comments
Closed

XML string handling is broken #9

aduskett opened this issue Jan 25, 2024 · 6 comments

Comments

@aduskett
Copy link

aduskett commented Jan 25, 2024

The title is a bit vague, so please let me explain:

Currently, there are two significant issues with how string handling is done:

  1. When processing a .od file, strings are not processed as byte arrays, which causes the size of arrays in generated c files to be much larger than intended. Take, for example, the following string: /X00/X00/X00. The old canfestival based on Python 2 creates an array of size three because it has three null characters, and Python2 treats strings as byte arrays. The current code treats the above as “/X00/X00/X00” which generates an array of size 12 because Python 3 treats all strings as UTF8!

  2. In the old canfestival GUI, strings were converted to base 10. IE: 0XFF is converted to decimal (255) for the GUI, then converted back to hex for the .OD file. The current GUI does not convert the base 16 numbers to base 10 for the GUI, which seems like a significant regression.

Any help? The XML parser had a lot of warnings and odd-looking code when I loaded it up into PyCharm.

Thanks!
Adam

@sveinse
Copy link
Member

sveinse commented Jan 25, 2024

The code quality of the python code in objdictgen has been (and still is) not great. The .od XML format is actually just a memdump of python objects and is the reason the format changes significantly between saves. This was not a particular wise design decision back in the days. This is why this tool now advocates using .json for storing the object dictionaries, and it has been designed to be stable & diffable.

What do you mean by "old canfestival GUI"? Can you be a bit more specific, please? There exists many different variants and forks of canfestival and I believe upstream has disappeared. I think we need to agree on what should be the supported variants of canfestival. My take would be to support the https://github.com/Laerdal/canfestival-3-asc repo, and I had (initially) no intent to support older versions. What is your use case?

What strings are this? Where is it used? Not all string types of canopen permits \x00 as valid characters.

In general, we can decide how we want to interpret strings. I suggest start making a pytest test function that test this string and asserts the expected behavior.

@robybona
Copy link

I am quite sure @aduskett is referring to the "legacy" file generation as referred in my issue #8 comment, when talking about "old canfestival GUI", and in general to file generation using objdictgen under Python 2.

I totally agree with you that the .od format should be obsoleted in favour of .json.

I also agree that https://github.com/Laerdal/canfestival-3-asc should be treated as the "official" one, and it is reasonable to not support older version too.

I think that, in case someone needs to use the "legacy" file generation, a reasonable solution could be to go for a specific venv, or a Docker container, using Python 2 and wxPython 2.7.
No need to support old variants, this way.

In relation to the current objdictgen/objdictedit version, I think the problem @aduskett is experiencing boils down to variables of "domain" type (that are managed as strings by the Python code), and not string (or "visible_string", using the canfestival nomenclature). Please see #10, where I explained what the problem is with "domain" and how to reproduce it.

@sveinse
Copy link
Member

sveinse commented Jan 30, 2024

The py2 objdictgen have an implicit encoding of strings and unicode in the od format:

<!-- A) This is a string -->
<attr name="s" type="string" value="hello" />

<!-- B) This is an unicode -->
<attr name="s" type="string">world</attr>

In py3, strings are all unicode, so this logic isn't easy to convert to py3. Currently format B is used when py3 is generating the od format, as it allows py3 generated ods to be read with py2. Because of this all strings of the py3 generated od will be using format B, while a py2 generated od will contain a mix of A and B. More precisely, only B in the few occasions where it explicitly use unicode.

This is where I think the xml encoding goes wrong for DOMAIN, is it requires a different handling that used to be implicit under py2. And this is what we need to find a solution for. One solution would be to add support for bytes in the od format. Note that this will break compatibility of the od files with py2. Would that be an acceptable solution? @aduskett @robybona

Are there other composite datatypes than DOMAIN that doesn't work in the UI? What about VISIBLE_STRING, OCTET_STRING and UNICODE_STRING, are they handled correctly? I suspect they're not.

@sveinse
Copy link
Member

sveinse commented Apr 13, 2024

@aduskett @robybona PR #16 fixes DOMAIN and should make the behavior equal to the legacy py2 handling. Please test. Can I assume that this issue is indeed about DOMAIN and we can close it when PR #16 is merged? Thanks.

@robybona
Copy link

@sveinse Yes, this issue can be closed once the PR #16 is merged.

@sveinse
Copy link
Member

sveinse commented Apr 15, 2024

PR #16 has now been merged. Closing issue. Thank you for testing @robybona

@sveinse sveinse closed this as completed Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants