Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NSFS | Fix Bug | Race Between List Object and Delete Object #8809

Merged
merged 1 commit into from
Mar 6, 2025

Conversation

shirady
Copy link
Contributor

@shirady shirady commented Feb 19, 2025

Explain the changes

  1. Replace the function stat_ignore_eacces with stat_if_exists and add ignore of ENOENT and ENOTDIR (in addition to EACCES with a flag of config.EACCES_IGNORE_ENTRY).
  2. Call stat on the entry_path one time and save is instead of calling it in concurrency so that we can be sure that the value was not deleted from the result array. Move the stat call before results.push(r) and results.splice(pos, 0, r) as they both add a result to the array.
  3. Add a comment before the check_access, see also in PR description of Added check_access() call to list objects #6576.

This change is continuing PR #8751 by adding a case to ignore a stat failure entry.

Issues: Fixed DFBUGS-1582

  1. Currently, during a list object operation there is a call to stat on the entry_path (which is the path of the key in the bucket in the FS). If a concurrent delete happens, this stat will return an error (ENOENT as the object was deleted). In this fix, we suggest that the stat would not return any value and, as a result, would not appear in the list object result.

GAPs - are mentioned in issue #8845 (discussions were documented here).

Testing Instructions:

Automatic Test:

Please run: sudo npx jest test_nsfs_concurrency.test.js

For example we could see the lines:

Mar-3 14:10:13.679 [/69906]    [L0] core.util.native_fs_utils:: stat_if_exists: Could not access file entry_path /private/tmp/test_nsfs_concurrency/my-key-2 error code ENOENT , skipping...
Mar-3 14:10:43.534 [/69906]    [L0] core.util.native_fs_utils:: stat_if_exists: Could not access file entry_path /private/tmp/test_nsfs_concurrency/my-key-1000 error code ENOENT , skipping...

Manual Testing instructions:

We will add code changes to simulate the problem:
Add the sleep function:

sleep(ms) {
    return new Promise(resolve => {
        setTimeout(resolve, ms);
    });
}

we will add the following lines:

console.log('SDSD before stat', this.bucket_path, r.key, fs_context);
console.log('SDSD sleep 8000');
await this.sleep(8000);
stat = await nb_native().fs.stat(fs_context, entry_path, { use_lstat });
  1. Create an account with the CLI: sudo node src/cmd/manage_nsfs account add --name <account-name> --new_buckets_path /Users/buckets/ --access_key <access-key> --secret_key <secret-key> --uid <uid> --gid <gid>
    Note: before creating the account need to give permission to the new_buckets_path: chmod 777 /Users/buckets/
  2. Start the NSFS server with: sudo node src/cmd/nsfs --debug 5
  3. Create the alias for S3 service:alias nc-user-1-s3=‘AWS_ACCESS_KEY_ID=<access-key> AWS_SECRET_ACCESS_KEY=<secret-key> aws --no-verify-ssl --endpoint-url https://localhost:6443’
  4. Check the connection to the endpoint and try to list the buckets (should be empty): nc-user-1-s3 s3 ls; echo $?
  5. Add bucket to the account using AWS CLI: nc-user-1-s3 s3 mb s3://bucket-race (bucket-race is the bucket name in this example)
  6. Put objects in the bucket: echo 'hello_world' | nc-user-1-s3 s3 cp - s3://bucket-race/hello_world1.txt (repeat this and change the key name to hello_world2.txt, hello_world3.txt, hello_world4.txt
  7. Tab1: Start the list objects: nc-user-1-s3 s3api list-objects-v2 --bucket bucket-race
    Tab2: Delete an object while you see the "SDSD before stat" on the key: nc-user-1-s3 s3api delete-object --bucket bucket-race --key hello_world3.txt (for example)
    Expect to see the list without the deleted key, and also to see in the logs printings.
Before the change

It was: An error occurred (NoSuchKey) when calling the ListObjectsV2 operation: The specified key does not exist.
and in the logs:

Feb-19 9:03:11.560 [nsfs/22301] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Resource>/bucket-race?list-type=2&amp;encoding-type=url</Resource><RequestId>m7bkghze-8fc8b7-e0y</RequestId></Error> GET /bucket-race?list-type=2&encoding-type=url {"host":"localhost:6443","accept-encoding":"identity","user-agent":"aws-cli/2.17.11 md/awscrt#0.20.11 ua/2.0 os/macos#24.2.0 md/arch#arm64 lang/python#3.11.10 md/pyimpl#CPython cfg/retry-mode#standard md/installer#source md/prompt#off md/command#s3api.list-objects-v2","x-amz-date":"20250219T070256Z","x-amz-content-sha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","authorization":"AWS4-HMAC-SHA256 Credential=Dwertyuiopasdfg11001/20250219/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=98e78f14c8a60ea9c1442face8f569e040bab077beeff61a124b07349e8eaa29"} Error: No such file or directory - context: Stat _path=/Users/buckets/bucket-race/hello_world3.txt

Logs before the change

  • Doc added/updated
  • Tests added

@shirady shirady self-assigned this Feb 19, 2025
@shirady shirady force-pushed the nsfs-list-race-delete branch from 5463962 to 4ff81c4 Compare February 20, 2025 09:48
@shirady shirady requested a review from guymguym February 20, 2025 11:45
@shirady shirady force-pushed the nsfs-list-race-delete branch 5 times, most recently from ddd6985 to 8ce3b9e Compare February 27, 2025 07:26
@shirady shirady requested a review from romayalon February 27, 2025 08:53
@shirady shirady force-pushed the nsfs-list-race-delete branch from 8ce3b9e to 3f35b9f Compare March 3, 2025 12:15
@shirady shirady force-pushed the nsfs-list-race-delete branch 3 times, most recently from d444e64 to 6ca05a4 Compare March 5, 2025 08:04
@shirady shirady force-pushed the nsfs-list-race-delete branch from ac3aef0 to 1c5964a Compare March 6, 2025 06:09
…eration

1. Replace the function stat_ignore_eacces with stat_if_exists and add ignore of ENOENT and ENOTDIR (in addition to EACCES with a flag of config.EACCES_IGNORE_ENTRY).
2. Call stat on the entry_path one time and save is instead of calling it in concurrency so that we can be sure that the value was not deleted from the result array. Move the stat call before results.push(r) and results.splice(pos, 0, r) as they both add a result to the array.
3. Add a comment before the check_access, see also in PR description of Added check_access() call to list objects noobaa#6576.

This change is continuing PR noobaa#8751 by adding a case to ignore a stat failure entry.

Signed-off-by: shirady <[email protected]>
@shirady shirady force-pushed the nsfs-list-race-delete branch from 1c5964a to fe4115c Compare March 6, 2025 07:56
@shirady shirady merged commit 3e038b0 into noobaa:master Mar 6, 2025
11 checks passed
@shirady
Copy link
Contributor Author

shirady commented Mar 6, 2025

Backport comment - We would like the backport to be with config.NSFS_LIST_IGNORE_ENTRY_ON_EACCES = false; (and not true).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants