Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors for segments with size > 2147483647 bytes #16

Open
lblatchford opened this issue Feb 2, 2023 · 4 comments
Open

Errors for segments with size > 2147483647 bytes #16

lblatchford opened this issue Feb 2, 2023 · 4 comments

Comments

@lblatchford
Copy link

When attempting to parse a NITF 2.1 file with a large image segment the following error occurred "Parse Error: Failed to populate ImageSegment[1]. Cause Parse Error: Cannot convert '2516582400' from Long type to Int (Value '2516582400' is out of range for xs:int"

The PayloadLength is xs:int. When the schema file is changed to make PayloadLength xs:unsignedLong, another error occurs:

"Failed to convert: Parse Error: Failed to populate ImageSegment[1]. Cause: Parse Error: Choice dispatch branch failed: List( Parse Error: Length for xs:hexBinary exceeds maximum of 2147483647 bytes: 2516582400".

I believe the xs:hexBinary type should have an unbounded maximum, so the reason for this error is unclear.

The test data would have to be approved for release by DISA. The test data request form and information about the files to be requested can be provided upon request.

@stevedlawrence
Copy link
Member

Under the hood, Daffodli uses a Java array to store hex binary bytes which has a maximum size of 231-1 bytes (i.e. 2 GB).

If hexBinary larger than 2GB is needed, it's probably better to use the Daffodil Binary Large Object extension. To use this, you can change the type from xs:hexBinary to xs:anyUri and add the property dfdlx:objectKind="bytes". This will cause Daffodil to write the raw bytes to a temporary file and put the path to that file in the infoset instead of hex binary. This is much more efficient compared xs:hexBinary and has a much larger size limit, I believe 263-1 bits which is something like 10^18 bytes.

@lblatchford
Copy link
Author

lblatchford commented Feb 2, 2023 via email

@mbeckerle
Copy link
Member

mbeckerle commented Jan 10, 2025

The large hexBinary fields in NITF should be replaced by arrays with large maxOccurs containing smaller hexBinary elements. This would enable them to represent blobs of data much larger than a single DFDL element can represent without hitting any int-size limits or JVM array maximum limits.

E.g., something like (untested):

<element name="whatever" dfdl:lengthKind="explicit" dfdl:length='{ ...
the big blob length ...}'>
  <complexType>
    <sequence>
       <element name="blob" dfdl:lengthKind="implicit">
           <!-- this blob element allows a dfdl:outputValueCalc='{
dfdl:contentLength(..../pixels/blob) }' to work to capture the length
when unparsing -->
           <complexType>
              <sequence>
                 <!-- Avoid giant lines. This is XML. Users *may* want
to open it in a text editor.
                        Note max size of blob is 10000000100. That's ~10TB
                    -->
                 <element name="a" type="xs:hexBinary" minOccurs="0"
maxOccurs="100000000" dfdl:lengthKind="explicit" dfdl:length="100"
dfdl:occursCountKind="implicit"/>
                 <element name="last" type="xs:hexBinary"
minOccurs="0" maxOccurs="1" dfdl:lengthKind="delimited"/>
             </sequence>
          </complexType>
        </element>
     </sequence>
   </complexType>
</element>

This change would affect the shape of the infoset, so it's a major new version and many of the test expected infosets would need to be adapted.

@lblatchford
Copy link
Author

lblatchford commented Jan 10, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants