core: implement `DeterministicRandomSource` #131607

joboet · 2024-10-12T15:16:03Z

ACP: rust-lang/libs-team#394
Tracking issue: #131606

The version implemented here uses ChaCha8 as RNG. Whether this is the right choice is still open for debate, so I've included the RNG name in the feature gate to make sure people need to recheck their code if we change the RNG.

Also, I've made one minor change to the API proposed in the ACP: in accordance with C-GETTER, get_seed is now named seed.

r? @joshtriplett

ACP: rust-lang/libs-team#394 Tracking issue: rust-lang#131606 The version implemented here uses ChaCha8 as RNG. Whether this is the right choice is still open for debate, so I've included the RNG name in the feature gate to make sure people need to recheck their code if we change the RNG. Also, I've made one minor change to the API proposed in the ACP: in accordance with [C-GETTER](https://rust-lang.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter), `get_seed` is now named `seed`.

library/core/src/random/deterministic.rs

Noratrieb · 2024-10-12T17:13:48Z

library/core/src/random/deterministic.rs

+/// [RFC 8439](https://datatracker.ietf.org/doc/html/rfc8439), but with reduced rounds.
+#[doc(hidden)]
+#[unstable(feature = "deterministic_random_internals", issue = "none")] // Used for testing only.
+pub mod chacha {


while it could always be optimized in the future, I think it should at least be optimized (with SIMD) before stabilization. it would be quite unfortunate to land an rng in std that's slower than the rand crate, I think.

On that note, it may be a good idea to borrow two tweaks from chacha8rand: defining the output to interleave multiple ChaCha20/8 blocks in the way 128-bit SIMD naturally does, and only adding the key to the final state matrix without also adding constants/counters/nonce to the other parts of the matrix. Both tweaks change the output, so they can't be done after stabilization, but they don't affect the quality and make SIMD implementations a little faster and simpler.

FWIW, I finally published my implementation of chacha8rand yesterday. I'm not suggesting that core should use this exact algorithm, let alone my implementation of it, but it may be useful as a reference for fast ChaCha8 SIMD implementations (including the aforementioned tweaks). Besides the portable scalar implementation I wrote SSE2, AVX2, AArch64 NEON, and wasm simd128 backends - much more platform coverage than rand_chacha currently has. The code is simpler and more self-contained than the chacha20 crate because it doesn't need to be parametrized over number of rounds and algorithm variants or integrate with crypto traits/infrastructure.

(Edit: But I did have to complicate some things for the sake of runtime feature detection, which core doesn't have right now. If static detection of SSE2 is good enough, the whole Backend indirection could be ripped out.)

juntyr · 2024-10-12T19:43:34Z

If std provides a seeded PRNG, I think it should be one that can support reseeding, fast skipping, and forking in its API.

joshtriplett · 2024-10-12T19:53:29Z

AFAICT, ChaCha supports random access /seeking, and you can always replace the seed.

joshtriplett · 2024-10-12T20:24:54Z

Also, I've made one minor change to the API proposed in the ACP: in accordance with C-GETTER, get_seed is now named seed.

Seems reasonable.

library/core/src/random/deterministic.rs

Lokathor · 2024-10-13T17:47:30Z

Since the generator is supposed to have a stable output sequence, should we name the generator after the generation method, allowing for other generation methods/sequences to be added in the future?
None of the type docs appear to state the generator's period before the sequence repeats. Can that be given somewhere?

programmerjake · 2024-10-15T16:51:13Z

can accessors be added for counter_nonce and any byte buffer that might be added? this would allow seeking and [de]serializing the state.

joshtriplett · 2024-10-16T04:40:04Z

can accessors be added for counter_nonce and any byte buffer that might be added? this would allow seeking and [de]serializing the state.

I think that should happen in a second pass. And ideally we should encapsulate all the state in a structure rather than requiring users to request and pass several separate values. (That's separate from the seed accessor, which still needs to exist.)

joshtriplett · 2024-10-23T07:36:06Z

@traviscross Any thoughts about the proposed switch to chacha8rand? It seems reasonable to me, but additional scrutiny would help.

hanna-kruppe · 2024-10-23T09:37:12Z

Note that chacha8rand does not support seeking, which is sometimes desirable. You can store the current position in the byte stream compactly, but you can’t seek forward or backwards to arbitrary positions by just adjusting a block counter. However, the output-affecting optimizations (interleaving several chacha8 blocks, skipping addition of constants/counters/nonce to the final state) introduced by chacha8rand are interesting even if seeking is desired.

programmerjake · 2024-10-23T15:35:09Z

so why not use chacha8rand except extend the counter/nonce back to 128 bits and don't constantly reseed, it should be easily seekable then...

hanna-kruppe · 2024-10-23T18:13:46Z

Right, that's what you'd do to get seeking. It just wouldn't be chacha8rand and should be called something different to avoid confusion.

traviscross · 2024-10-23T22:43:57Z

Some thoughts:

First, as others have mentioned above, I really want to call this core::random::ChaCha8Source. This would minimize the possibility of regret. Among other reasons, if we did later want to add something like core::random::ChaCha8RandSource, with its various other tradeoffs, we could do that.

Regarding ChaCha8Rand broadly, as others have mentioned, it's not seekable, and that's a rather important property for many things. That's not to say we shouldn't have it or something like it, but simply that it should sit neatly alongside options that are seekable.

Regarding ChaCha8Rand in particular, that algorithm is probably not how I'd prefer to solve the problem it's trying to solve (optimizing better for wider SIMD lanes). I'd probably prefer to use a natively wider block permutation, and thereby benefit from diffusion across all words, rather than adapt ChaCha by striping across multiple instances in an ad-hoc way.

Regarding the number of rounds, 8 is a kind of an unfortunate number. It's secure today, but with quite limited security margin. (Conversely, it's probably overkill by 2-4 rounds if we don't want security at all, i.e., are OK with some detectable patterns in the output that wouldn't matter for most non-cryptographic applications.)

Yes, yes, we're not going for secure here, but that's exactly the problem with this number. This will be easy to use, and it will not have obvious security flaws, so people will use it in places that at least some security is called for. Security needs aren't all or none, of course.

If we call it ChaCha8Source, then actually I think that's fine, because we could always add a ChaCha12Source (or a ChaCha4Source, or a ChaChaSource<const N: usize>), and people would know what they're getting.

But what I'd like to avoid is that we ship DeterministicSource at N rounds, then a couple years later look at how people are using it in practice, and decide that, actually, we'd prefer to have some security margin here, and then have to deprecate it and add a DeterministicMoreSecureSource. Or conversely, that we decide we really want it to be faster and deprecate it in favor of a DeterministicFasterLessSecureSource.

I'd rather people just know what they're getting, and for something meant to be deterministic, it just makes a lot of sense to name the algorithm.

hanna-kruppe · 2024-10-24T09:03:13Z

ChaCha4 is known to fail PractRand so it’s worse than the many PRNGs that pass it with smaller state size, smaller code size, similar or better performance on CPUs without SIMD, and less chance of being confused for something secure.

ChaCha6 may be acceptable for statistical purposes but the performance advantage over eight rounds is small. For long-output throughput, the maximum theoretical speed up is 8/6 = 1.33x, but this ignores the round-independent costs of per-block setup/finalization, internal buffer management, and actually doing something with the output. For shorter outputs such as single integers, the speedup will be even less because consuming a buffer in four or eight byte increments is a significant chunk of the work when filling the buffer takes less than a cycle per byte. For example, when I benchmark my chacha8rand implementation, the AVX2 backend is 2x faster than the SSE2 backend for reading several hundred bytes at a time but ~~(from memory)~~ only about 1.3x faster at reading individual u32s. (edit: I checked now that I'm on my computer again, the 1.3x was accurate, more precisely 1.32x)

Reducing rounds also doesn’t address the drawbacks of generating output in multiples of 64 bytes (some implementation complexity and increased state size if you buffer internally; terrible performance for small reads if you don’t). If ChaCha8 is considered overkill then the logical conclusion to me is an entirely different algorithm that isn’t block based, not fewer rounds.

hanna-kruppe · 2024-10-24T09:30:02Z

I'd probably prefer to use a natively wider block permutation, and thereby benefit from diffusion across all words, rather than adapt ChaCha by striping across multiple instances in an ad-hoc way.

Is there such an algorithm today? It’s not clear to me that you can achieve similar or better diffusion across a larger state size with fewer operations per byte/word. Plus, a larger block size means more live values to keep in registers, which disproportionately penalizes processors without 128+ bit SIMD. ISAs with 16+ general purpose registers can compute one ChaCha block at a time without spilling to the stack (or only very occasionally with 16 GPRs), but that’s not true when a single block is 4x larger.

ChrisDenton · 2024-10-24T12:34:33Z

Sorry if I've missed it but is there a particular reason we would prefer to use ChaCha? I mean, it makes sense in the context of this PR as a kind of placeholder for whatever the final algorithm is. But otherwise it seems like it's not really worth getting in the weeds over unless there isn't a more obviously fitting choice?

hanna-kruppe · 2024-10-24T16:52:36Z

It's an unresolved question in the tracking issue, there was a little discussion about it there and much more on the ACP. I don't get the impression that there's any consensus about it. Speaking only for myself, I did not see any alternative suggestion that's obviously better choice than some suitable ChaCha variant. It's just a well understood option in the space, and has enough knobs that you can get easily get into the weeds over.

joboet · 2024-10-28T14:21:24Z

I've now implemented buffering and added some tests. As for other algorithms, I'd prefer it if that discussion could be moved to the tracking issue?

library/core/src/random/deterministic.rs

traviscross · 2024-10-28T21:31:29Z

Nit:

ChaCha4 is known to fail PractRand so it’s worse than the many PRNGs that pass it...

The citation given does not support this claim. It says that ChaCha4 does pass Practrand (but that ChaCha3 does not):

Three rounds of ChaCha are not enough [1], but four rounds are enough to satisfy DIEHARDER, PractRand, and TestU01 [2].

[2] ChaCha(4) passed TestU01 small crush and full crush. It passed PractRand using up to 512 GB.

That same result is also reported here:

https://www.pcg-random.org/posts/more-practrand-passes.html

Lokathor · 2024-10-28T21:33:55Z

As the cited article says: If it's not cryptographic to begin with, why chacha of any round count at all?

hanna-kruppe · 2024-10-28T21:38:05Z

The citation given does not support this claim. It says that ChaCha4 does pass Practrand (but that ChaCha3 does not):

You're right, sorry for the confusion. I might have misread the "up to 512 GB" bit in the footnote (as "... but fails with more output").

joboet · 2024-11-04T18:29:49Z

I've implemented the suggestions, thank you everyone!

@joshtriplett I'm thinking about renaming from_seed to with_seed, since that IMHO fits better with other constructors (e.g. with_hasher) where with is used to specify parameters. Also, from sort-of implies value-preservation, which isn't really a thing here. Any thoughts?

library/core/src/random/deterministic.rs

library/core/tests/random.rs

Co-authored-by: Josh Triplett <[email protected]>

bors · 2025-01-27T07:22:40Z

☔ The latest upstream changes (presumably #135937) made this pull request unmergeable. Please resolve the merge conflicts.

joboet added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Oct 12, 2024

rustbot assigned joshtriplett Oct 12, 2024

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 12, 2024

This comment has been minimized.

Sign in to view

joboet force-pushed the deterministic_random branch from 9463326 to 23d15b6 Compare October 12, 2024 16:30

Noratrieb reviewed Oct 12, 2024

View reviewed changes

library/core/src/random/deterministic.rs Show resolved Hide resolved

Noratrieb reviewed Oct 12, 2024

View reviewed changes

hanna-kruppe reviewed Oct 12, 2024

View reviewed changes

library/core/src/random/deterministic.rs Outdated Show resolved Hide resolved

joboet added 2 commits October 28, 2024 15:18

core: cache the RNG output

f4a20b2

core: add RNG tests

4f63307

hanna-kruppe reviewed Oct 28, 2024

View reviewed changes

library/core/src/random/deterministic.rs Outdated Show resolved Hide resolved

traviscross reviewed Oct 28, 2024

View reviewed changes

library/core/src/random/deterministic.rs Show resolved Hide resolved

core: use a const for the block size, use usize instead of u8

2e401ba

joshtriplett reviewed Nov 9, 2024

View reviewed changes

library/core/src/random/deterministic.rs Outdated Show resolved Hide resolved

joshtriplett reviewed Nov 9, 2024

View reviewed changes

library/core/tests/random.rs Show resolved Hide resolved

core: improve comments

fa7ae2e

Co-authored-by: Josh Triplett <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: implement `DeterministicRandomSource` #131607

core: implement `DeterministicRandomSource` #131607

joboet commented Oct 12, 2024

This comment has been minimized.

Noratrieb Oct 12, 2024

hanna-kruppe Oct 12, 2024

hanna-kruppe Oct 15, 2024 •

edited

Loading

juntyr commented Oct 12, 2024

joshtriplett commented Oct 12, 2024 •

edited

Loading

joshtriplett commented Oct 12, 2024

Lokathor commented Oct 13, 2024

programmerjake commented Oct 15, 2024

joshtriplett commented Oct 16, 2024 •

edited

Loading

joshtriplett commented Oct 23, 2024

hanna-kruppe commented Oct 23, 2024

programmerjake commented Oct 23, 2024

hanna-kruppe commented Oct 23, 2024

traviscross commented Oct 23, 2024 •

edited

Loading

hanna-kruppe commented Oct 24, 2024 •

edited

Loading

hanna-kruppe commented Oct 24, 2024

ChrisDenton commented Oct 24, 2024

hanna-kruppe commented Oct 24, 2024

joboet commented Oct 28, 2024

traviscross commented Oct 28, 2024 •

edited

Loading

Lokathor commented Oct 28, 2024

hanna-kruppe commented Oct 28, 2024

joboet commented Nov 4, 2024

bors commented Jan 27, 2025

core: implement DeterministicRandomSource #131607

Are you sure you want to change the base?

core: implement DeterministicRandomSource #131607

Conversation

joboet commented Oct 12, 2024

This comment has been minimized.

Noratrieb Oct 12, 2024

Choose a reason for hiding this comment

hanna-kruppe Oct 12, 2024

Choose a reason for hiding this comment

hanna-kruppe Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

juntyr commented Oct 12, 2024

joshtriplett commented Oct 12, 2024 • edited Loading

joshtriplett commented Oct 12, 2024

Lokathor commented Oct 13, 2024

programmerjake commented Oct 15, 2024

joshtriplett commented Oct 16, 2024 • edited Loading

joshtriplett commented Oct 23, 2024

hanna-kruppe commented Oct 23, 2024

programmerjake commented Oct 23, 2024

hanna-kruppe commented Oct 23, 2024

traviscross commented Oct 23, 2024 • edited Loading

hanna-kruppe commented Oct 24, 2024 • edited Loading

hanna-kruppe commented Oct 24, 2024

ChrisDenton commented Oct 24, 2024

hanna-kruppe commented Oct 24, 2024

joboet commented Oct 28, 2024

traviscross commented Oct 28, 2024 • edited Loading

Lokathor commented Oct 28, 2024

hanna-kruppe commented Oct 28, 2024

joboet commented Nov 4, 2024

bors commented Jan 27, 2025

core: implement `DeterministicRandomSource` #131607

core: implement `DeterministicRandomSource` #131607

hanna-kruppe Oct 15, 2024 •

edited

Loading

joshtriplett commented Oct 12, 2024 •

edited

Loading

joshtriplett commented Oct 16, 2024 •

edited

Loading

traviscross commented Oct 23, 2024 •

edited

Loading

hanna-kruppe commented Oct 24, 2024 •

edited

Loading

traviscross commented Oct 28, 2024 •

edited

Loading