Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Retrieval Limitation When Parsing ISO and Reading a File Larger than 4GB (version = 1.2.0) #275

Open
Charger0 opened this issue Jan 2, 2025 · 4 comments

Comments

@Charger0
Copy link

Charger0 commented Jan 2, 2025

  1. Description

I've encountered a troublesome issue during the relevant operations. Currently, I'm attempting to parse an ISO file and read a file within it that is larger than 4GB. However, during this reading process, I can only obtain data with a size of around 4GB minus 2KB, and I'm unable to retrieve the entire content of this large file, which has consequently hindered the normal progress of the subsequent processing workflow.
Steps to Reproduce

  1. Preparation:
	disk, err := diskfs.OpenWithMode("1.iso", diskfs.ReadOnly)
	if err != nil {

	}

	// 
	fs, err := disk.GetFilesystem(0)
	if err != nil {


	}
	file, err := fs.OpenFile(filePath, os.O_RDONLY)

	// 获取当前位置
	originalPos, err := file.Seek(0, io.SeekCurrent)
	if err != nil {
		fmt.Printf("获取当前位置出错: %v\n", err)

	}

	// 移动到文件末尾获取大小
	endPos, err := file.Seek(0, io.SeekEnd)
	if err != nil {
		fmt.Printf("获取文件大小出错: %v\n", err)

	}
	fileSize := endPos - originalPos
	fmt.Printf("文件大小为: %d字节\n", fileSize)
  1. example : Only <4G can be analyzed

image

however,the exacted file size

      root@charger-PC:/data/home/charger/Downloads# ls -al |grep filesystem.squashfs 
      -rw-r--r--  1 charger charger 4580610048 10月 23 13:45 filesystem.squashfs
@Charger0 Charger0 changed the title Data Retrieval Limitation When Parsing ISO and Reading a File Larger than 4GB Data Retrieval Limitation When Parsing ISO and Reading a File Larger than 4GB (version = 1.2.0) Jan 2, 2025
@deitch
Copy link
Collaborator

deitch commented Jan 2, 2025

@Charger0 welcome. Thanks for the detailed recreation of the problem. Do you mind editing the original comment to describe how to create an ISO with an embedded file of greater than 4GB?

Sure, I know it is straightforward, just create a random file in a tempdir and run mkisofs or similar, but it it helpful for people in the future to have an exact recreation path, from zero to done.

@deitch
Copy link
Collaborator

deitch commented Jan 2, 2025

Also, it is helpful if the example provided can be made a standalone program you can run go run on. I know, it is not to hard to add it, but it makes a different to people working issues.

@deitch
Copy link
Collaborator

deitch commented Jan 2, 2025

FWIW, here is how I recreated it:

package main

import (
        "fmt"
        "io"
        "log"
        "os"

        "github.com/diskfs/go-diskfs"
)

func main() {
        if len(os.Args) < 3 {
                log.Fatalf("Usage: %s path-to-iso file-path-inside-iso", os.Args[0])
        }
        isoPath, filePath := os.Args[1], os.Args[2]
        disk, err := diskfs.Open(isoPath, diskfs.WithOpenMode(diskfs.ReadOnly))
        if err != nil {
                log.Fatal(err)
        }

        fs, err := disk.GetFilesystem(0)
        if err != nil {
                log.Fatal(err)
        }
        file, err := fs.OpenFile(filePath, os.O_RDONLY)

        originalPos, err := file.Seek(0, io.SeekCurrent)
        if err != nil {
                log.Fatal(err)
        }

        endPos, err := file.Seek(0, io.SeekEnd)
        if err != nil {
                log.Fatal(err)

        }
        fileSize := endPos - originalPos
        fmt.Printf("fileSize: %d\n", fileSize)
}

And then I do go run . /tmp/myiso.iso /large.dat.

@deitch
Copy link
Collaborator

deitch commented Jan 2, 2025

Oh, how interesting. I created my iso file:

mkdir /tmp/iso/
echo abc > /tmp/iso/abc
echo def > /tmp/iso/def
dd if=/dev/urandom of=/tmp/iso/large.dat bs=1M count=5000
xorriso -as mkisofs -o /tmp/myiso.iso -iso-level 3 /tmp/iso

So I should have a large ~5GB file inside the iso.

This is where it gets interesting. I stepped through it, and when it read the root directory, I looked at the entries:

        *{
                extAttrSize: 0,
                location: 35,
                size: 4294965248,
                creation: (*time.Time)(0xc0001f8cc0),
                isHidden: false,
                isSubdirectory: false,
                isAssociated: false,
                hasExtendedAttrs: false,
                hasOwnerGroupPermissions: false,
                hasMoreEntries: true,
                isSelf: false,
                isParent: false,
                volumeSequence: 1,
                filesystem: *(*"github.com/diskfs/go-diskfs/filesystem/iso9660.FileSystem")(0xc0001fc090),
                filename: "LARGE.DAT;1",
                extensions: []github.com/diskfs/go-diskfs/filesystem/iso9660.directoryEntrySystemUseExtension len: 3, cap: 4, [
                        *(*"github.com/diskfs/go-diskfs/filesystem/iso9660.directoryEntrySystemUseExtension")(0xc0001a2400),
                        *(*"github.com/diskfs/go-diskfs/filesystem/iso9660.directoryEntrySystemUseExtension")(0xc0001a2410),
                        *(*"github.com/diskfs/go-diskfs/filesystem/iso9660.directoryEntrySystemUseExtension")(0xc0001a2420),
                ],},
        *{
                extAttrSize: 0,
                location: 2097186,
                size: 947914752,
                creation: (*time.Time)(0xc0001f8ef0),
                isHidden: false,
                isSubdirectory: false,
                isAssociated: false,
                hasExtendedAttrs: false,
                hasOwnerGroupPermissions: false,
                hasMoreEntries: false,
                isSelf: false,
                isParent: false,
                volumeSequence: 1,
                filesystem: *(*"github.com/diskfs/go-diskfs/filesystem/iso9660.FileSystem")(0xc0001fc090),
                filename: "LARGE.DAT;1",
                extensions: []github.com/diskfs/go-diskfs/filesystem/iso9660.directoryEntrySystemUseExtension len: 3, cap: 4, [
                        *(*"github.com/diskfs/go-diskfs/filesystem/iso9660.directoryEntrySystemUseExtension")(0xc0001a2480),
                        *(*"github.com/diskfs/go-diskfs/filesystem/iso9660.directoryEntrySystemUseExtension")(0xc0001a2490),
                        *(*"github.com/diskfs/go-diskfs/filesystem/iso9660.directoryEntrySystemUseExtension")(0xc0001a24a0),
                ],},

There are two entries for large.dat, the first is the normal max size of 4294965248, the second is 947914752, and if you put them togther, 4294965248+947914752 = 5,242,880,000, which is the actual file size:

$ ls -l /tmp/iso/
total 5120012
-rw-rw-r-- 1 ubuntu ubuntu          4 Jan  2 08:56 abc
-rw-rw-r-- 1 ubuntu ubuntu          4 Jan  2 08:56 def
-rw-rw-r-- 1 ubuntu ubuntu 5242880000 Jan  2 08:57 large.dat

It looks like the library needs additional support to handle the file being larger than the normal 4GB max size.

I am not sure how the iso filesystem indicates that the second is just a continuation of the first, if it is just the filename or something else.

I definitely am open to a PR to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants