Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for content length/size in Link Objects #107

Open
HadrienGardeur opened this issue Feb 5, 2025 · 9 comments · May be fixed by #110
Open

Support for content length/size in Link Objects #107

HadrienGardeur opened this issue Feb 5, 2025 · 9 comments · May be fixed by #110

Comments

@HadrienGardeur
Copy link
Member

The LCP spec allows links to contain two pieces of information that are currently missing in RWPM:

  • content length
  • and a hash of the content

As we're working on accessibility, both pieces of information are becoming increasingly useful.

Content length is also useful for other things, for example in order to optimize how you can do a partial fetch on a packaged publication.

I would recommend using length and have it behave exactly like Content-Length in HTTP headers in terms of value.

Any thoughts on this one?

@chocolatkey
Copy link
Member

chocolatkey commented Feb 5, 2025

In case it's useful, for the experimental work I did on WP manifests so you can read directly from a ZIP without having to parse, the structure was:

{
  "method": 7, // ZIP compression method
  "offset": 123, // Offset from the beginning of the ZIP
  "uncompressedSize": 4567, // Uncompressed resource size
  "compressedSize": 1234 // 0 if not compressed
}

If you just have the length alone, it doesn't help as much for range reads, because you're still going to need to parse the central directory and that has the lengths as well.
Also, I think having length behave like content-length is going to be confusing, because if a resource is compressed in an HTTP response, content-length is the compressed length. I think it would be confusing to add a key named length to links that could be the compressed size without even knowing if the resource is compressed. I would expect a length key to be the uncompressed resource size in bytes.

For the hash, I would suggest having a prefix indicating the hashing algorithm used, such as blake2b- or sha256- similar to SRI

@HadrienGardeur
Copy link
Member Author

That structure would show up under properties then.

This issue is mostly focused on adding a new top level property, similar to height, width, duration, language or bitrate, which all convey information about the resource itself.

The uncompressed size would make the most sense to me in that case, while an extension dedicated to packages and/or compression could cover the other three elements.

@HadrienGardeur
Copy link
Member Author

HadrienGardeur commented Feb 10, 2025

If I go back to your example above @chocolatkey, this could look like this:

{
  "href": "chapter1.mp3",
  "type": "audio/mpeg",
  "size": 234,
  "properties": {
    "archive": {
      "method": "deflate",
      "offset": 123,
      "compressedSize": 123
    }
  }
}

I'm not entirely convinced by the use of an integer by method, I'd rather use strings with an enum.

Update: Replaced compression with archive and replaced integer with a string for method.

@HadrienGardeur HadrienGardeur pinned this issue Feb 10, 2025
@mickael-menu
Copy link
Member

In the mobile toolkits we have something similar but we used an archive key instead of compression, because the resource is not systematically compressed. offset would be useful even if there's no compression.

As it was an extension, it is a URI:

{
    "properties": {
        "https://readium.org/webpub-manifest/properties#archive": {
            "entryLength": 1234,
            "isEntryCompressed": true
        }
    }
}

I'm not entirely convinced by the use of an integer by method, I'd rather use strings with an enum.

I agree.

@HadrienGardeur
Copy link
Member Author

In the mobile toolkits we have something similar but we used an archive key instead of compression, because the resource is not systematically compressed.

That's a good point. We might need to think about the use cases though, for example would we ever need to use this for streaming over HTTPS as well?

@HadrienGardeur
Copy link
Member Author

@chocolatkey @mickael-menu I've updated the example above based on the feedback so far.

I'd like to open a PR for a new module dedicated to that, but wanted to point out that there's a bit of redundancy currently with the encryption module: https://github.com/readium/webpub-manifest/blob/master/modules/encryption.md

This module already contains originalLength, which would be deprecated in favor of size here. It also contains a compression property which would be the equivalent of method (is this the right name though?) in archive.

IMO, cleanly separating properties that are related to encryption from the ones related to the use of an archive is a good idea overall.

@HadrienGardeur
Copy link
Member Author

We've discussed this extensively with @chocolatkey earlier this week and we seem to be in agreement to do the following:

  • Open a PR that adds size as a first-class property for Link Objects
  • Open another PR that adds a new module for archive and deprecates both originalLength and compression for encrypted

Here's an updated example based on this plan:

{
  "href": "chapter1.html",
  "type": "text/html",
  "size": 234,
  "properties": {
    "archive": {
      "method": "deflate",
      "offset": 123,
      "compressedSize": 123
    },
    "encrypted": {
      "scheme": "http://readium.org/2014/01/lcp",
      "profile": "http://readium.org/lcp/basic-profile",
      "algorithm": "http://www.w3.org/2001/04/xmlenc#aes256-cbc"
    }
  }
}

@danielweck
Copy link
Member

why archive.compressedSize instead of just archive.size? (for example when archive.method is store instead of deflate)

@HadrienGardeur
Copy link
Member Author

why archive.compressedSize instead of just archive.size? (for example when archive.method is store instead of deflate)

If we're sure that people won't confuse the size in the Link Object with size in archive, we could do that.

But if we're talking about store, then you wouldn't need compressedSize in that case, method and offset would be enough since we already know the original size in this example.

@HadrienGardeur HadrienGardeur linked a pull request Feb 18, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants