Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with bput #64

Closed
jnimoth opened this issue Dec 16, 2024 · 23 comments
Closed

Issues with bput #64

jnimoth opened this issue Dec 16, 2024 · 23 comments
Assignees
Labels
bug Something isn't working need confirmation

Comments

@jnimoth
Copy link

jnimoth commented Dec 16, 2024

A user of ours experiences issues when trying to upload with gocmd bput. I tried myself today and also got a lot of errors when trying bput command.

He a simple example:

$ ll bput_test
total 3.4G
-rw-rw-r-- 1 jelte jelte 954M Dec 16 14:33 1GB_Testfile
-rw-rw-r-- 1 jelte jelte 1.9G Dec 16 14:33 2GB_Testfile
-rw-rw-r-- 1 jelte jelte 477M Dec 16 14:33 500MB_Testfile
-rw-rw-r-- 1 jelte jelte  48M Dec 16 14:33 50MB_Testfile

Trying to upload via gocmd bput -d --min_file_num 1 bput_test/ gives an error (see attached log bput.log).

I also end up with a non-good replica in .gocmd_staging as well as parts of the files being transferred:

$ ils -L bput_test
/testrug/home/[email protected]/bput_test:
  rdms-testers      0 rootResc;mnt_nfsirods    500000000 2024-12-16.15:02 & 500MB_Testfile
        generic    /mnt/nfsirods/home/[email protected]/bput_test/500MB_Testfile
  rdms-testers      0 rootResc;mnt_nfsirods     50000000 2024-12-16.15:02 & 50MB_Testfile
        generic    /mnt/nfsirods/home/[email protected]/bput_test/50MB_Testfile
$ ils -L .gocmd_staging
/testrug/home/[email protected]/.gocmd_staging:
  rdms-testers      0 rootResc;mnt_nfsirods            0 2024-12-16.15:02 ? bundle_e7752d5110e9db4b6ce5ade52e2c6da4.tar
        generic    /mnt/nfsirods/home/[email protected]/.gocmd_staging/bundle_e7752d5110e9db4b6ce5ade52e2c6da4.tar

If I remove the 1GB and 2GB files from the folder, the bput transfer succeeds with the same parameters.

I also tried the original folder once more, but specified a bigger --max_file_size than the default (precisely: gocmd bput -d --min_file_num 1 --max_file_size "21474836480" bput_test/. This also resulted in an error, but a different one (see attached log bput_again.log). In this case, the upload location is completely empty and I also end up with a non-good replica in .gocmd_staging.

$ ils -L .gocmd_staging
/testrug/home/[email protected]/.gocmd_staging:
  rdms-testers      0 rootResc;mnt_nfsirods            0 2024-12-16.15:15 ? bundle_b5a4a757b693c4bfb0c96b46fa6b0ad7.tar
        generic    /mnt/nfsirods/home/[email protected]/.gocmd_staging/bundle_b5a4a757b693c4bfb0c96b46fa6b0ad7.tar

I also saw that when I use the --icat flag, the same upload seems to succeed, but I do not understand why so far.

So

$ gocmd bput --icat -d --min_file_num 1 bput_test/

results in all files being upload. No errors or non-good replicas in this case (see attached log bput_icat_default_size.log).

We are using iRODS 4.3.2 + PAM in the background. It is for me currently not clear if this is an issue with bput, our system, or wrong use of the command.

bput.log
bput_again.log
bput_icat_default_size.log

@iychoi
Copy link
Member

iychoi commented Dec 18, 2024

I believe there is a bug in redirect-to-resource feature which implements direct access to resource server.
Uploading a large bundle file failed due to the issue, therefore, you are seeing non-good replicas.

It is also possible that firewalls in the network disconnect the iRODS connection for some reasons. We have the firewall issue in CyVerse and we are waiting for patching the university firewall.

Anyway, the issue is not limited to bput, so can you try uploading large files using put with or without --icat? If you see the same issue with put without --icat, it's clearly redirect-to-resource bug. If put with --icat also fails, it's something else, like firewall, this is what we are facing in CyVerse.

@jnimoth
Copy link
Author

jnimoth commented Dec 19, 2024

I tested gocmd put with a 10GB file.

Normal put without any extra flags give an error as well for this file:

./gocmd put --progress /media/jelte/data_drive/10GB_Testfile 
/media/jelte/data_drive/10GB_Testfile                    ... done! [10.00GB in 1m27.29s]
Unexpected error!
Error Trace:
  - failed to perform parallel jobs:
    github.com/cyverse/gocommands/cmd/subcmd.(*PutCommand).Process
        /github/workspace/cmd/subcmd/put.go:213
  - failed to upload "/media/jelte/data_drive/10GB_Testfile" to "/rug/home/[email protected]/10GB_Testfile":
    github.com/cyverse/gocommands/cmd/subcmd.(*PutCommand).schedulePut.func1
        /github/workspace/cmd/subcmd/put.go:412
  - failed to upload data object chunk "/media/jelte/data_drive/10GB_Testfile" to resource server:
    github.com/cyverse/go-irodsclient/irods/fs.UploadDataObjectToResourceServer.func1
        /go/pkg/mod/github.com/cyverse/[email protected]/irods/fs/data_object_resource_server.go:781
  - failed to read transfer header from resource server:
    github.com/cyverse/go-irodsclient/irods/fs.uploadDataObjectChunkToResourceServer
        /go/pkg/mod/github.com/cyverse/[email protected]/irods/fs/data_object_resource_server.go:438
  - failed to receive data:
    github.com/cyverse/go-irodsclient/irods/connection.(*IRODSResourceServerConnection).RecvWithTrackerCallBack
        /go/pkg/mod/github.com/cyverse/[email protected]/irods/connection/resource_server_connection.go:362
  - failed to read from socket:
    github.com/cyverse/go-irodsclient/irods/util.ReadBytesWithTrackerCallBack
        /go/pkg/mod/github.com/cyverse/[email protected]/irods/util/io.go:28
  - EOF

And the file arrives as expected in a non-good state in the end:

ils -L 10GB_Testfile
  j.p.nimoth@r      0 rootResc;randy;pt0;mnt_nfsirods0            0 2024-12-19.15:16 ? 10GB_Testfile
        generic    /mnt/nfsirods/home/[email protected]/10GB_Testfile

Trying the same with --icat uploads the file without any errors!

Additionally, I also tried iput with a smaller files:
Interestingly, a 1GB Testfile did not give the error with put, even without the --icat flag. Using a 2GB file gave the error as shown above. Also tried 3GB which gave me an error as well. To my surprise, a 5GB file was uploaded in the same way without an issue (tried two times, two times without error).

@iychoi
Copy link
Member

iychoi commented Jan 7, 2025

Can you try v0.10.9 to check if the issue still exists?

@jnimoth
Copy link
Author

jnimoth commented Jan 13, 2025

I re-tried today the same things as before with the current v0.10.9 version. For me, the issue seems to be still there.
So again using this source folder:

$ ll bput_test/
total 3.4G
-rw-rw-r-- 1 jelte jelte 954M Jan 13 10:10 1GB_Testfile
-rw-rw-r-- 1 jelte jelte 1.9G Jan 13 10:10 2GB_Testfile
-rw-rw-r-- 1 jelte jelte 477M Jan 13 10:10 500MB_Testfile
-rw-rw-r-- 1 jelte jelte  48M Jan 13 10:09 50MB_Testfile

First, running ./gocmd bput -d --min_file_num 1 ./bput_test/ results in:

  • Incomplete transfer (50 + 500 MB are uploaded)
  • non-good replica in .gocmd_staging.
  • bput_min_1_max_size_default.log for the log file.

bput_min_1_max_size_default.log

Content of .gocmd_staging and dest. directory:

$ ils -L .gocmd_staging bput_test/ 
/rug/home/[email protected]/.gocmd_staging:
  j.p.nimoth@r      0 rootResc;randy;pt0;mnt_nfsirods0            0 2025-01-13.10:22 ? bundle_4399661cc3b529a04b2b753e4dada537.tar
        generic    /mnt/nfsirods/home/[email protected]/.gocmd_staging/bundle_4399661cc3b529a04b2b753e4dada537.tar
/rug/home/[email protected]/bput_test:
  j.p.nimoth@r      0 rootResc;randy;pt0;mnt_nfsirods0    500000000 2025-01-13.10:22 & 500MB_Testfile
        generic    /mnt/nfsirods/home/[email protected]/bput_test/500MB_Testfile
  j.p.nimoth@r      0 rootResc;randy;pt0;mnt_nfsirods0     50000000 2025-01-13.10:22 & 50MB_Testfile
        generic    /mnt/nfsirods/home/[email protected]/bput_test/50MB_Testfile

Second, running with the ./gocmd bput -d --min_file_num 1 --max_file_size "21474836480" ./bput_test/:

  • Upload dest. completely empty
  • a single, non-good replica at .gocmd_staging
  • Loogs in bput_min_1_max_size_21474836480.log

bput_min_1_max_size_21474836480.log

Third, running with --icat does result in it to work. All files uploaded to dest and no non-good replicas in .gocmd_staging

Fourth, I also re-tried uploading a single, bigger file using gocmd put with our without --icat flag, so ./gocmd put -d (--icat) /media/jelte/data_drive/10GB_Testfile.
Running without --icat gives again an error (see attached 10GB_put_no_icat.log) and results in a non-good replica:

$ ils -L 10GB_Testfile
  j.p.nimoth@r      0 rootResc;randy;pt0;mnt_nfsirods0            0 2025-01-13.10:39 ? 10GB_Testfile
        generic    /mnt/nfsirods/home/[email protected]/10GB_Testfile

10GB_put_no_icat.log

The same with --icat does work.

@bartns
Copy link

bartns commented Feb 19, 2025

Experiencing similar issues when using gocmd sync

@iychoi
Copy link
Member

iychoi commented Feb 19, 2025

@bartns can you share your full command-line inputs? By default sync does not use bput command anymore. So if this is true, it means --icat is also unstable.

@iychoi iychoi added the bug Something isn't working label Feb 19, 2025
@iychoi iychoi self-assigned this Feb 19, 2025
@bartns
Copy link

bartns commented Feb 19, 2025

Its the most default usage. It's part of an upload script we have running at the end of an analysis.

gocmd sync --progress $destination_path i:$destination_path_parent

It usually goes fine. But sometimes we do get a similar issues like this one. We have not yet put effort to properly reproduce it though. So not sure if it's the same issue

@jnimoth
Copy link
Author

jnimoth commented Feb 19, 2025

I would like to say that I saw today that I also see issues with simple gocmd sync in our system. The issue seems to only manifest if there are files of a certain size from what I saw so far.

For example, I have several subfolders each with 10 files of different sizes (500 MB, 1GB, 5GB) that I try to sync.
First, for 500MB and 1GB files all is good, no errors and files on iRODS appear fine (not shown).

But with 5GB test files, I get:

$ ./gocmd sync /media/jelte/data_drive/test_files/5GB/ i:/testrug/home/[email protected]/gocmd_tests/5GB_dest
File or directory "/testrug/home/[email protected]/gocmd_tests/5GB_dest/5GB/5GB_Testfile_10" is not found!

And on iRODS side, I see:

$ ils -Lr gocmd_tests/5GB_dest
/testrug/home/[email protected]/gocmd_tests/5GB_dest:
  C- /testrug/home/[email protected]/gocmd_tests/5GB_dest/5GB
/testrug/home/[email protected]/gocmd_tests/5GB_dest/5GB:
  rdms-testers      0 rootResc;mnt_nfsirods            0 2025-02-19.17:02 ? 5GB_Testfile_1
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest/5GB/5GB_Testfile_1
  rdms-testers      0 rootResc;mnt_nfsirods            0 2025-02-19.17:03 ? 5GB_Testfile_10
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest/5GB/5GB_Testfile_10

I also tried if the behavior changes if I additionally use --icat, and it seems to succeed then for the same data and environment:

$./gocmd sync --icat /media/jelte/data_drive/test_files/5GB/ i:/testrug/home/[email protected]/gocmd_tests/5GB_dest_2

And on the iRODS side all files are now uploaded, no error and all with good status:

$ ils -Lr gocmd_tests/5GB_dest_2
/testrug/home/[email protected]/gocmd_tests/5GB_dest_2:
  C- /testrug/home/[email protected]/gocmd_tests/5GB_dest_2/5GB
/testrug/home/[email protected]/gocmd_tests/5GB_dest_2/5GB:
  rdms-testers      0 rootResc;mnt_nfsirods   5000000000 2025-02-19.17:05 & 5GB_Testfile_1
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest_2/5GB/5GB_Testfile_1
  rdms-testers      0 rootResc;mnt_nfsirods   5000000000 2025-02-19.17:06 & 5GB_Testfile_10
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest_2/5GB/5GB_Testfile_10
  rdms-testers      0 rootResc;mnt_nfsirods   5000000000 2025-02-19.17:07 & 5GB_Testfile_2
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest_2/5GB/5GB_Testfile_2
  rdms-testers      0 rootResc;mnt_nfsirods   5000000000 2025-02-19.17:08 & 5GB_Testfile_3
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest_2/5GB/5GB_Testfile_3
  rdms-testers      0 rootResc;mnt_nfsirods   5000000000 2025-02-19.17:09 & 5GB_Testfile_4
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest_2/5GB/5GB_Testfile_4
  rdms-testers      0 rootResc;mnt_nfsirods   5000000000 2025-02-19.17:09 & 5GB_Testfile_5
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest_2/5GB/5GB_Testfile_5
  rdms-testers      0 rootResc;mnt_nfsirods   5000000000 2025-02-19.17:10 & 5GB_Testfile_6
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest_2/5GB/5GB_Testfile_6
  rdms-testers      0 rootResc;mnt_nfsirods   5000000000 2025-02-19.17:11 & 5GB_Testfile_7
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest_2/5GB/5GB_Testfile_7
  rdms-testers      0 rootResc;mnt_nfsirods   5000000000 2025-02-19.17:12 & 5GB_Testfile_8
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest_2/5GB/5GB_Testfile_8
  rdms-testers      0 rootResc;mnt_nfsirods   5000000000 2025-02-19.17:12 & 5GB_Testfile_9
        generic    /mnt/nfsirods/home/[email protected]/gocmd_tests/5GB_dest_2/5GB/5GB_Testfile_9

I am not sure if that is the same that @bartns sees or if this is directly related to the issue with bput that I reported here, but there seems to be something not quite right.
For comparison, I also tried the 5GB folder ones via icommands using irsync -rv. It also suceeded.

@iychoi
Copy link
Member

iychoi commented Feb 19, 2025

@bartns can you double check gocommands version? I recently made a change on sync to use --icat by default. So if you are using old gocommands, the issue is the same one that @jnimoth encountered.

@bartns
Copy link

bartns commented Feb 19, 2025

Sorry should have mentioned that. Most recent case was yesterday on the latest version: v0.10.9

@iychoi
Copy link
Member

iychoi commented Feb 19, 2025

Ah.. I was confused.
Both sync and bput use redirect-to-resource by default, not icat for transfer. I think redirect-to-resource sometimes goes wrong.

@bartns can you check if the same failure occurs when you try the same command again? I need to reproduce the issue on my side to debug.

@bartns
Copy link

bartns commented Feb 19, 2025

Yes, I can reproduce the same issue with sync. Already happened with a 1.4GB file. I think @jnimoth pretty much did the same. It succeeds with --icat

@iychoi
Copy link
Member

iychoi commented Feb 19, 2025

Can you tell me the exact size in bytes for the 1.4GB file?
Also, can you tell me your iRODS version?

@bartns
Copy link

bartns commented Feb 19, 2025

1429448198

irods version 4.3.3

@iychoi
Copy link
Member

iychoi commented Feb 19, 2025

I currently do not have test environment for 4.3.3. I'll need to work on making a new docker-based test environment first. This will take some time.

@trel
Copy link
Contributor

trel commented Feb 20, 2025

https://github.com/irods/irods_testing_environment may provide an easy way to stand things up.

@trel
Copy link
Contributor

trel commented Feb 20, 2025

https://github.com/irods/irods_demo might also help - just turn off the containers you don't need.

@jjkoehorst
Copy link

@iychoi need access to our iRODS instance? @bartns

@iychoi
Copy link
Member

iychoi commented Mar 6, 2025

I was able to find the cause. Will fix it soon.

@iychoi
Copy link
Member

iychoi commented Mar 6, 2025

@jnimoth @jjkoehorst Can you confirm if new release fixes the issue?
https://github.com/cyverse/gocommands/releases/tag/v0.10.17

@bartns
Copy link

bartns commented Mar 7, 2025

Thank you for your work! Highly appreciated.

I can confirm that with sync it transfers now correctly (with and without --icat) .
However --progress does not give any output anymore.

@jnimoth
Copy link
Author

jnimoth commented Mar 7, 2025

I also repeated the upload that I reported above as failing (using the 5GB files) with the new version: It now worked to upload the data using gocmd sync without any additional flags.

I can't confirm @bartns note that --progress does not work anymore. I also tried a sync with this additional flag. It showed progress for me as expected.

@iychoi
Copy link
Member

iychoi commented Mar 7, 2025

@jnimoth Thank you for the confirmation.
@bartns --progress does not work well with -d as it prints out a lot of debugging messages. Please open another ticket if the --progress issue persists.

@iychoi iychoi closed this as completed Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working need confirmation
Projects
None yet
Development

No branches or pull requests

5 participants