Feature request for GGUF: store offset to tensor_data in the header #11737

certik · 2025-02-07T16:26:14Z

certik
Feb 7, 2025

When writing custom readers for GGUF, it would be very helpful to be able to skip the header and go directly to tensor_data to load all arrays (if I know their order and dimensions). Currently I have two options:

Either implement all metadata types and parse all of the header, even if I don't need it (this adds complexity to the reader)
Or store the offset to tensor_data as a first variable in the header (an example of this approach is here: https://github.com/certik/fastGPT/blob/7d96ec2e23b2a1a07aea625f72661d3f650c7ee5/driver.f90#L72)

I use the second approach, which works great, but it means I have to create special GGUF files that have this general.data_offset first variable. If the GGUF format had this offset as part of the file format, then any GGUF file can be easily parsed with this approach. Many other file formats have a similar feature to skip sections that the reader does not care about by specifying the section length. This also makes it possible to use existing readers for future GGUF versions that might add more metadata types (the reader would simply skip the rest of the header if it doesn't support a given type).

If implemented in the next GGUF version, this would make it easy to write a simple reader in custom computational code and reuse existing GGUF writers, thus making the GGUF format more versatile for almost any numerical computational work.

Related proposal: #3975.

networkbassem · 2025-02-07T19:15:54Z

networkbassem
Feb 7, 2025

I totally agree that adding a tensor_data_offset to the GGUF format would make life a lot easier for custom readers! The current options require either parsing the entire metadata, which adds unnecessary complexity, or modifying GGUF files to include a special offset. If the GGUF format supported this offset as part of the specification, it would make things much cleaner and more efficient.

We could add an optional tensor_data_offset field in the GGUF header to allow readers to quickly access the tensor data, skipping over unnecessary metadata. This would also keep the format backward-compatible because older readers could simply ignore this field.

Define the tensor_data_offset in the GGUF header (it could be a simple 8-byte integer or a pointer depending on the format).
Modify the reader to check if the tensor_data_offset exists, and if it does, use that to jump directly to the tensor data.
Possible GGUF Format with tensor_data_offset:

{
  "metadata": {
    "tensor_data_offset": 1234567
  },
  "tensor_data": [ ... ]
}

Code Example for Reader:

#include <iostream>
#include <fstream>

struct GGUFHeader {
    uint64_t tensor_data_offset;  // Offset to the tensor data
    // other metadata...
};

void read_tensor_data(const std::string &file_path) {
    std::ifstream file(file_path, std::ios::binary);
    if (!file.is_open()) {
        std::cerr << "Error opening file!" << std::endl;
        return;
    }

    // Read the GGUF header to get the offset
    GGUFHeader header;
    file.read(reinterpret_cast<char*>(&header), sizeof(header));
    
    // If there's a tensor data offset, jump to it
    if (header.tensor_data_offset != 0) {
        file.seekg(header.tensor_data_offset, std::ios::beg);

        // Now we can read the tensor data directly
        char tensor_data[1024];  // Assuming a max size for the tensor data
        file.read(tensor_data, sizeof(tensor_data));

        // Process tensor data...
        std::cout << "Successfully loaded tensor data!" << std::endl;
    } else {
        std::cerr << "Tensor data offset not found!" << std::endl;
    }

    file.close();
}

int main() {
    // Example usage
    read_tensor_data("model.gguf");
    return 0;
}

This allows you to skip directly to the tensor data without having to parse the entire header.

0 replies

ggerganov · 2025-02-08T09:33:47Z

ggerganov
Feb 8, 2025
Maintainer

skip the header and go directly to tensor_data to load all arrays (if I know their order and dimensions)

This seems like a very narrow use case - generally one does not know the order and the dimensions of the tensor data.

0 replies

certik · 2025-02-08T15:44:35Z

certik
Feb 8, 2025
Author

In the above use case, I first read the 10 parameters of the GPT-2 model here: https://github.com/certik/fastGPT/blob/7d96ec2e23b2a1a07aea625f72661d3f650c7ee5/driver.f90#L106, then use those dimensions to properly read the rest of the arrays. So one doesn't need to know the dimensions, but one needs to know the order. The big advantage is that it's easy to read it in any computational code in any language.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request for GGUF: store offset to tensor_data in the header #11737

{{title}}

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Feature request for GGUF: store offset to tensor_data in the header #11737

certik Feb 7, 2025

Replies: 3 comments

networkbassem Feb 7, 2025

ggerganov Feb 8, 2025 Maintainer

certik Feb 8, 2025 Author

certik
Feb 7, 2025

networkbassem
Feb 7, 2025

ggerganov
Feb 8, 2025
Maintainer

certik
Feb 8, 2025
Author