-
Notifications
You must be signed in to change notification settings - Fork 13
Using a protobuf or other IPFS-ish format for the wrapping archive #6
Comments
@wking we're going to have to do something like the fanout index anyways, we dont have fixed block sizes. |
my thoughts are that we use a very similar protobuf to the merkledag itself, and simply add a field in the links for 'offset' which will represent the offset of that child within the archive. |
writing then becomes as easy as traversing the DAG: rough psuedo code... func writeDag(n *dag.Node, w io.WriteSeeker) error {
w.Write(n.Hash())
w.Write(n.Data)
offset := w.CurrentOffset
for link := range n.Links {
w.Write(link.Name)
w.Write(link.Hash)
w.Write(link.Size)
w.Write(int64(0))
}
for link := range n.Links {
childoff := w.CurrentOffset
writeDag(link.GetNode())
// Now seek back and write the offset into the link table
w.Seek(offset + (math to find right location))
w.Write(childoff)
// now seek back down and continue writing
}
} This would allow O(log(n)) (depth of the dag) time lookups |
On Tue, May 05, 2015 at 08:26:44PM -0700, Jeromy Johnson wrote:
I'm all for fanout, but Git's packfiles use the fanout table for |
On Tue, May 05, 2015 at 08:36:00PM -0700, Jeromy Johnson wrote:
If you are looking up objects by following a QmKey1/QmKey2/QmKey3 I was thinking of fanout and index tables as Git uses them, a single |
Hrm, i guess we should lay out what properties we are interested in having before attempting to design anything. |
In ipfs/kubo#1195, @jbenet proposed making the archive file itself a DAG object (or maybe just a protobuf or something ;). I'm not entirely clear on what he has in mind.
Looking over the protobuf encoding, I think it's a poor choice for the index, because efficient bisection requires fixed-size records, and protobuf encoding works hard to not use fixed-size fields. Jumping to the n-th record is slow if you have to read all n-1 records to figure out how long they are. We'll have the same problem if we try to put something like Git's fanout index in a protobuf. On the other hand, I don't have a problem with putting a fanout table, index, and array of serialized objects into a wrapping protobuf. But I don't see that gaining a lot of efficiency, either in programming time or processing efficiency, with such a small portion of the file syntax being handled by protobufs.
So I'm currently leaning against this, but I admit to not understanding the vision ;). If anyone wants to clear me up on this front, I'd be happy for the enlightenment.
The text was updated successfully, but these errors were encountered: