Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: investigate and prevent potential denial of service #6125

Open
chu11 opened this issue Jul 19, 2024 · 5 comments
Open

kvs: investigate and prevent potential denial of service #6125

chu11 opened this issue Jul 19, 2024 · 5 comments

Comments

@chu11
Copy link
Member

chu11 commented Jul 19, 2024

Related to #6124, there may be some denial or service possibilities in the KVS.

The first one that comes to mind is that there is no limit on the size of a kvstxn that can be sent to the KVS, like it could be a billion entries long.

(Mildly related, #1202, #1206, #1207, where some data could become crazy huge)

Edit: #5613 also related, although that issue is a bit more broad.

Edit: #6112 mildly related, it's sort of like a DoS

Edit: also ... I don't think we have a limit on path depth, like I think they can be infinitely deep ("a.b.c.d......a.a.a.a.a.a.a ...")

@garlick
Copy link
Member

garlick commented Jul 19, 2024

It might be interesting to survey blob sizes currently in the content cache on el cap. We could just look at one of the dump files I guess and make a histogram.

The original design for content was to have a 1MB limit on blobs, FWIW.

Edit: oh, I guess surveying keys in the dump is not the same as blob sizes but it would still be interesting.

@chu11
Copy link
Member Author

chu11 commented Jan 15, 2025

The first one that comes to mind is that there is no limit on the size of a kvstxn that can be sent to the KVS, like it could be a billion entries long.

This is the most obvious DoS that could occur in the KVS and is probably the first one to tackle.

The simplest solution would be to have some configurable max.

That said, we actually don't know what the min, max, mean, stddev, etc. of KVS ops is. So perhaps getting some stats out of the KVS on that would be the first thing to do. Should be easy to add to the KVS stats output using tstat.

@chu11
Copy link
Member Author

chu11 commented Jan 22, 2025

The first one that comes to mind is that there is no limit on the size of a kvstxn that can be sent to the KVS, like it could be a billion entries long.

It occurred to me today that there are really two issues:

  • maximum number of ops in a single transaction
  • total number of ops a user has submitted

the former has some "quality of service" feel to it, in that other transactions can sneak in between multiple transactions that have MAX operations. This is certainly issue number one.

the latter is like if the user submits MAX ops over and over again. We'd want to "rate limit" total ops. This might be issue two.

@garlick
Copy link
Member

garlick commented Jan 22, 2025

Can you refresh my memory? If a user commits several large blobs in a transaction, is this correct

  • the blobs are base64 encoded and included in the commit request
  • the commit request is handled by the local kvs rank, which commits the blobs to the content store and alters the commit request to only contain content references, not the data, when it sends it upstream
  • so the rank 0 kvs only deals with blobrefs not blobs (unless it's handling commits started on rank 0)

My biggest worry about the KVS is the size of messages because that can cause head of line blocking at the broker level. I'm not as worried about it on rank > 0 compared to rank 0 though.

And I guess it's obvious but worth repeating because I don't see it mentioned here - it's guests in the system instance (or owner services operating on behalf of a guest) that can really do the damage.

@chu11
Copy link
Member Author

chu11 commented Jan 22, 2025

Can you refresh my memory? If a user commits several large blobs in a transaction, is this correct

  • the blobs are base64 encoded and included in the commit request
  • the commit request is handled by the local kvs rank, which commits the blobs to the content store and alters the commit request to only contain content references, not the data, when it sends it upstream
  • so the rank 0 kvs only deals with blobrefs not blobs (unless it's handling commits started on rank 0)

No, everything is forwarded to rank 0.

But you bring up a good point, size of the blobs is also an issue. My initial crack at this was the length of the txn operations, i.e. like a billion entry KVS txn.

Edit: the size thing is semi-covered by #6265, I guess I didn't initially consider it a DoS issue, but it can fall into that camp too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants