-
Notifications
You must be signed in to change notification settings - Fork 13.3k
core: implement DeterministicRandomSource
#131607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
ACP: rust-lang/libs-team#394 Tracking issue: rust-lang#131606 The version implemented here uses ChaCha8 as RNG. Whether this is the right choice is still open for debate, so I've included the RNG name in the feature gate to make sure people need to recheck their code if we change the RNG. Also, I've made one minor change to the API proposed in the ACP: in accordance with [C-GETTER](https://rust-lang.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter), `get_seed` is now named `seed`.
9463326
to
23d15b6
Compare
/// [RFC 8439](https://datatracker.ietf.org/doc/html/rfc8439), but with reduced rounds. | ||
#[doc(hidden)] | ||
#[unstable(feature = "deterministic_random_internals", issue = "none")] // Used for testing only. | ||
pub mod chacha { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while it could always be optimized in the future, I think it should at least be optimized (with SIMD) before stabilization. it would be quite unfortunate to land an rng in std that's slower than the rand crate, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that note, it may be a good idea to borrow two tweaks from chacha8rand: defining the output to interleave multiple ChaCha20/8 blocks in the way 128-bit SIMD naturally does, and only adding the key to the final state matrix without also adding constants/counters/nonce to the other parts of the matrix. Both tweaks change the output, so they can't be done after stabilization, but they don't affect the quality and make SIMD implementations a little faster and simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I finally published my implementation of chacha8rand yesterday. I'm not suggesting that core should use this exact algorithm, let alone my implementation of it, but it may be useful as a reference for fast ChaCha8 SIMD implementations (including the aforementioned tweaks). Besides the portable scalar implementation I wrote SSE2, AVX2, AArch64 NEON, and wasm simd128 backends - much more platform coverage than rand_chacha
currently has. The code is simpler and more self-contained than the chacha20
crate because it doesn't need to be parametrized over number of rounds and algorithm variants or integrate with crypto traits/infrastructure.
(Edit: But I did have to complicate some things for the sake of runtime feature detection, which core doesn't have right now. If static detection of SSE2 is good enough, the whole Backend
indirection could be ripped out.)
If std provides a seeded PRNG, I think it should be one that can support reseeding, fast skipping, and forking in its API. |
AFAICT, ChaCha supports random access /seeking, and you can always replace the seed. |
Seems reasonable. |
|
can accessors be added for |
I think that should happen in a second pass. And ideally we should encapsulate all the state in a structure rather than requiring users to request and pass several separate values. (That's separate from the seed accessor, which still needs to exist.) |
@traviscross Any thoughts about the proposed switch to chacha8rand? It seems reasonable to me, but additional scrutiny would help. |
Note that chacha8rand does not support seeking, which is sometimes desirable. You can store the current position in the byte stream compactly, but you can’t seek forward or backwards to arbitrary positions by just adjusting a block counter. However, the output-affecting optimizations (interleaving several chacha8 blocks, skipping addition of constants/counters/nonce to the final state) introduced by chacha8rand are interesting even if seeking is desired. |
so why not use chacha8rand except extend the counter/nonce back to 128 bits and don't constantly reseed, it should be easily seekable then... |
Right, that's what you'd do to get seeking. It just wouldn't be chacha8rand and should be called something different to avoid confusion. |
Some thoughts: First, as others have mentioned above, I really want to call this Regarding Regarding Regarding the number of rounds, 8 is a kind of an unfortunate number. It's secure today, but with quite limited security margin. (Conversely, it's probably overkill by 2-4 rounds if we don't want security at all, i.e., are OK with some detectable patterns in the output that wouldn't matter for most non-cryptographic applications.) Yes, yes, we're not going for secure here, but that's exactly the problem with this number. This will be easy to use, and it will not have obvious security flaws, so people will use it in places that at least some security is called for. Security needs aren't all or none, of course. If we call it But what I'd like to avoid is that we ship I'd rather people just know what they're getting, and for something meant to be deterministic, it just makes a lot of sense to name the algorithm. |
ChaCha4 is known to fail PractRand so it’s worse than the many PRNGs that pass it with smaller state size, smaller code size, similar or better performance on CPUs without SIMD, and less chance of being confused for something secure. ChaCha6 may be acceptable for statistical purposes but the performance advantage over eight rounds is small. For long-output throughput, the maximum theoretical speed up is 8/6 = 1.33x, but this ignores the round-independent costs of per-block setup/finalization, internal buffer management, and actually doing something with the output. For shorter outputs such as single integers, the speedup will be even less because consuming a buffer in four or eight byte increments is a significant chunk of the work when filling the buffer takes less than a cycle per byte. For example, when I benchmark my chacha8rand implementation, the AVX2 backend is 2x faster than the SSE2 backend for reading several hundred bytes at a time but Reducing rounds also doesn’t address the drawbacks of generating output in multiples of 64 bytes (some implementation complexity and increased state size if you buffer internally; terrible performance for small reads if you don’t). If ChaCha8 is considered overkill then the logical conclusion to me is an entirely different algorithm that isn’t block based, not fewer rounds. |
Is there such an algorithm today? It’s not clear to me that you can achieve similar or better diffusion across a larger state size with fewer operations per byte/word. Plus, a larger block size means more live values to keep in registers, which disproportionately penalizes processors without 128+ bit SIMD. ISAs with 16+ general purpose registers can compute one ChaCha block at a time without spilling to the stack (or only very occasionally with 16 GPRs), but that’s not true when a single block is 4x larger. |
Sorry if I've missed it but is there a particular reason we would prefer to use ChaCha? I mean, it makes sense in the context of this PR as a kind of placeholder for whatever the final algorithm is. But otherwise it seems like it's not really worth getting in the weeds over unless there isn't a more obviously fitting choice? |
It's an unresolved question in the tracking issue, there was a little discussion about it there and much more on the ACP. I don't get the impression that there's any consensus about it. Speaking only for myself, I did not see any alternative suggestion that's obviously better choice than some suitable ChaCha variant. It's just a well understood option in the space, and has enough knobs that you can get easily get into the weeds over. |
I've now implemented buffering and added some tests. As for other algorithms, I'd prefer it if that discussion could be moved to the tracking issue? |
Nit:
The citation given does not support this claim. It says that ChaCha4 does pass Practrand (but that ChaCha3 does not):
That same result is also reported here: |
As the cited article says: If it's not cryptographic to begin with, why chacha of any round count at all? |
You're right, sorry for the confusion. I might have misread the "up to 512 GB" bit in the footnote (as "... but fails with more output"). |
I've implemented the suggestions, thank you everyone! @joshtriplett I'm thinking about renaming |
Co-authored-by: Josh Triplett <[email protected]>
☔ The latest upstream changes (presumably #135937) made this pull request unmergeable. Please resolve the merge conflicts. |
ACP: rust-lang/libs-team#394
Tracking issue: #131606
The version implemented here uses ChaCha8 as RNG. Whether this is the right choice is still open for debate, so I've included the RNG name in the feature gate to make sure people need to recheck their code if we change the RNG.
Also, I've made one minor change to the API proposed in the ACP: in accordance with C-GETTER,
get_seed
is now namedseed
.r? @joshtriplett