Skip to content
This repository was archived by the owner on Apr 29, 2020. It is now read-only.

Ensuring completeness and catching corruption #4

Open
wking opened this issue May 5, 2015 · 2 comments
Open

Ensuring completeness and catching corruption #4

wking opened this issue May 5, 2015 · 2 comments

Comments

@wking
Copy link

wking commented May 5, 2015

As discussed in ipfs/kubo#1195, Git's packfiles have a trailling hash which seems to function as a single check for incomplete/corrupted packs (although I don't see how it could distinguish between the two cases. @chriscool wants some sort of fast, simple check for these issues, but I'd rather handle it in a lower or higher level protocol, and @jbenet wants to wrap the whole archive in a Merkle object (and somehow transport its hash?) and use link hashes to verify correctness and isolate corruption/truncation.

@chriscool
Copy link

About checking in the transport layer, it does not address truncation when files are written on the disk.

I am ok to wrap an archive in a Merkle object, if its hash is also transfered, as this gives the same guarantees as a trailing hash.

By the way with a Git pack file it's also possible to check each object and isolate corruption/truncation, but this is done only when there is no other way to get the data back.

@wking
Copy link
Author

wking commented May 6, 2015

On Wed, May 06, 2015 at 06:53:43AM -0700, Christian Couder wrote:

About checking in the transport layer, it does not address
truncation when files are written on the disk.

Well, you can check for truncation in writing to the disk with a
save-time atomic-copy operation (to protect against a power outage,
etc.) and then protect against corruption and later truncation by
using your filesystem's hashing mechanism 1. Basically, saving to
and then reading back from the disk is just another transport
mechanism like TCP, and ensuring that it is fairly reliable is a job
that I think is better left to the transport implementation itself.

I am ok to wrap an archive in a Merkle object, if its hash is also
transfered, as this gives the same guarantees as a trailing hash.

Yeah this is basically the same thing as a trailing hash. If we want
to require it, I think we might as well follow Git and just add it to
the archive file. But I'm still leaning towards “folks who don't
trust their underlying transport (TCP, btrfs, …) should implement
their own checking mechanism on top of the archive format
(e.g. distributing the archive hash along with the archive, or
distributing a detached GnuPG signature along with the archive, or …).
Personally, I'd go with the GnuPG signature for archives I was
transmitting across the network, but I don't want to bake that into
the file format ;). Is higher-level truncation/corruption checking
too heavy a requirement?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants