Skip to content

core: implement DeterministicRandomSource #131607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

joboet
Copy link
Member

@joboet joboet commented Oct 12, 2024

ACP: rust-lang/libs-team#394
Tracking issue: #131606

The version implemented here uses ChaCha8 as RNG. Whether this is the right choice is still open for debate, so I've included the RNG name in the feature gate to make sure people need to recheck their code if we change the RNG.

Also, I've made one minor change to the API proposed in the ACP: in accordance with C-GETTER, get_seed is now named seed.

r? @joshtriplett

@joboet joboet added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Oct 12, 2024
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 12, 2024
@rust-log-analyzer

This comment has been minimized.

ACP: rust-lang/libs-team#394
Tracking issue: rust-lang#131606

The version implemented here uses ChaCha8 as RNG. Whether this is the right choice is still open for debate, so I've included the RNG name in the feature gate to make sure people need to recheck their code if we change the RNG.

Also, I've made one minor change to the API proposed in the ACP: in accordance with [C-GETTER](https://rust-lang.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter), `get_seed` is now named `seed`.
@joboet joboet force-pushed the deterministic_random branch from 9463326 to 23d15b6 Compare October 12, 2024 16:30
/// [RFC 8439](https://datatracker.ietf.org/doc/html/rfc8439), but with reduced rounds.
#[doc(hidden)]
#[unstable(feature = "deterministic_random_internals", issue = "none")] // Used for testing only.
pub mod chacha {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while it could always be optimized in the future, I think it should at least be optimized (with SIMD) before stabilization. it would be quite unfortunate to land an rng in std that's slower than the rand crate, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that note, it may be a good idea to borrow two tweaks from chacha8rand: defining the output to interleave multiple ChaCha20/8 blocks in the way 128-bit SIMD naturally does, and only adding the key to the final state matrix without also adding constants/counters/nonce to the other parts of the matrix. Both tweaks change the output, so they can't be done after stabilization, but they don't affect the quality and make SIMD implementations a little faster and simpler.

Copy link
Contributor

@hanna-kruppe hanna-kruppe Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I finally published my implementation of chacha8rand yesterday. I'm not suggesting that core should use this exact algorithm, let alone my implementation of it, but it may be useful as a reference for fast ChaCha8 SIMD implementations (including the aforementioned tweaks). Besides the portable scalar implementation I wrote SSE2, AVX2, AArch64 NEON, and wasm simd128 backends - much more platform coverage than rand_chacha currently has. The code is simpler and more self-contained than the chacha20 crate because it doesn't need to be parametrized over number of rounds and algorithm variants or integrate with crypto traits/infrastructure.

(Edit: But I did have to complicate some things for the sake of runtime feature detection, which core doesn't have right now. If static detection of SSE2 is good enough, the whole Backend indirection could be ripped out.)

@juntyr
Copy link
Contributor

juntyr commented Oct 12, 2024

If std provides a seeded PRNG, I think it should be one that can support reseeding, fast skipping, and forking in its API.

@joshtriplett
Copy link
Member

joshtriplett commented Oct 12, 2024

AFAICT, ChaCha supports random access /seeking, and you can always replace the seed.

@joshtriplett
Copy link
Member

Also, I've made one minor change to the API proposed in the ACP: in accordance with C-GETTER, get_seed is now named seed.

Seems reasonable.

@Lokathor
Copy link
Contributor

  1. Since the generator is supposed to have a stable output sequence, should we name the generator after the generation method, allowing for other generation methods/sequences to be added in the future?
  2. None of the type docs appear to state the generator's period before the sequence repeats. Can that be given somewhere?

@programmerjake
Copy link
Member

can accessors be added for counter_nonce and any byte buffer that might be added? this would allow seeking and [de]serializing the state.

@joshtriplett
Copy link
Member

joshtriplett commented Oct 16, 2024

can accessors be added for counter_nonce and any byte buffer that might be added? this would allow seeking and [de]serializing the state.

I think that should happen in a second pass. And ideally we should encapsulate all the state in a structure rather than requiring users to request and pass several separate values. (That's separate from the seed accessor, which still needs to exist.)

@joshtriplett
Copy link
Member

@traviscross Any thoughts about the proposed switch to chacha8rand? It seems reasonable to me, but additional scrutiny would help.

@hanna-kruppe
Copy link
Contributor

Note that chacha8rand does not support seeking, which is sometimes desirable. You can store the current position in the byte stream compactly, but you can’t seek forward or backwards to arbitrary positions by just adjusting a block counter. However, the output-affecting optimizations (interleaving several chacha8 blocks, skipping addition of constants/counters/nonce to the final state) introduced by chacha8rand are interesting even if seeking is desired.

@programmerjake
Copy link
Member

so why not use chacha8rand except extend the counter/nonce back to 128 bits and don't constantly reseed, it should be easily seekable then...

@hanna-kruppe
Copy link
Contributor

Right, that's what you'd do to get seeking. It just wouldn't be chacha8rand and should be called something different to avoid confusion.

@traviscross
Copy link
Contributor

traviscross commented Oct 23, 2024

Some thoughts:

First, as others have mentioned above, I really want to call this core::random::ChaCha8Source. This would minimize the possibility of regret. Among other reasons, if we did later want to add something like core::random::ChaCha8RandSource, with its various other tradeoffs, we could do that.

Regarding ChaCha8Rand broadly, as others have mentioned, it's not seekable, and that's a rather important property for many things. That's not to say we shouldn't have it or something like it, but simply that it should sit neatly alongside options that are seekable.

Regarding ChaCha8Rand in particular, that algorithm is probably not how I'd prefer to solve the problem it's trying to solve (optimizing better for wider SIMD lanes). I'd probably prefer to use a natively wider block permutation, and thereby benefit from diffusion across all words, rather than adapt ChaCha by striping across multiple instances in an ad-hoc way.

Regarding the number of rounds, 8 is a kind of an unfortunate number. It's secure today, but with quite limited security margin. (Conversely, it's probably overkill by 2-4 rounds if we don't want security at all, i.e., are OK with some detectable patterns in the output that wouldn't matter for most non-cryptographic applications.)

Yes, yes, we're not going for secure here, but that's exactly the problem with this number. This will be easy to use, and it will not have obvious security flaws, so people will use it in places that at least some security is called for. Security needs aren't all or none, of course.

If we call it ChaCha8Source, then actually I think that's fine, because we could always add a ChaCha12Source (or a ChaCha4Source, or a ChaChaSource<const N: usize>), and people would know what they're getting.

But what I'd like to avoid is that we ship DeterministicSource at N rounds, then a couple years later look at how people are using it in practice, and decide that, actually, we'd prefer to have some security margin here, and then have to deprecate it and add a DeterministicMoreSecureSource. Or conversely, that we decide we really want it to be faster and deprecate it in favor of a DeterministicFasterLessSecureSource.

I'd rather people just know what they're getting, and for something meant to be deterministic, it just makes a lot of sense to name the algorithm.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Oct 24, 2024

ChaCha4 is known to fail PractRand so it’s worse than the many PRNGs that pass it with smaller state size, smaller code size, similar or better performance on CPUs without SIMD, and less chance of being confused for something secure.

ChaCha6 may be acceptable for statistical purposes but the performance advantage over eight rounds is small. For long-output throughput, the maximum theoretical speed up is 8/6 = 1.33x, but this ignores the round-independent costs of per-block setup/finalization, internal buffer management, and actually doing something with the output. For shorter outputs such as single integers, the speedup will be even less because consuming a buffer in four or eight byte increments is a significant chunk of the work when filling the buffer takes less than a cycle per byte. For example, when I benchmark my chacha8rand implementation, the AVX2 backend is 2x faster than the SSE2 backend for reading several hundred bytes at a time but (from memory) only about 1.3x faster at reading individual u32s. (edit: I checked now that I'm on my computer again, the 1.3x was accurate, more precisely 1.32x)

Reducing rounds also doesn’t address the drawbacks of generating output in multiples of 64 bytes (some implementation complexity and increased state size if you buffer internally; terrible performance for small reads if you don’t). If ChaCha8 is considered overkill then the logical conclusion to me is an entirely different algorithm that isn’t block based, not fewer rounds.

@hanna-kruppe
Copy link
Contributor

I'd probably prefer to use a natively wider block permutation, and thereby benefit from diffusion across all words, rather than adapt ChaCha by striping across multiple instances in an ad-hoc way.

Is there such an algorithm today? It’s not clear to me that you can achieve similar or better diffusion across a larger state size with fewer operations per byte/word. Plus, a larger block size means more live values to keep in registers, which disproportionately penalizes processors without 128+ bit SIMD. ISAs with 16+ general purpose registers can compute one ChaCha block at a time without spilling to the stack (or only very occasionally with 16 GPRs), but that’s not true when a single block is 4x larger.

@ChrisDenton
Copy link
Member

Sorry if I've missed it but is there a particular reason we would prefer to use ChaCha? I mean, it makes sense in the context of this PR as a kind of placeholder for whatever the final algorithm is. But otherwise it seems like it's not really worth getting in the weeds over unless there isn't a more obviously fitting choice?

@hanna-kruppe
Copy link
Contributor

It's an unresolved question in the tracking issue, there was a little discussion about it there and much more on the ACP. I don't get the impression that there's any consensus about it. Speaking only for myself, I did not see any alternative suggestion that's obviously better choice than some suitable ChaCha variant. It's just a well understood option in the space, and has enough knobs that you can get easily get into the weeds over.

@joboet
Copy link
Member Author

joboet commented Oct 28, 2024

I've now implemented buffering and added some tests. As for other algorithms, I'd prefer it if that discussion could be moved to the tracking issue?

@traviscross
Copy link
Contributor

traviscross commented Oct 28, 2024

Nit:

ChaCha4 is known to fail PractRand so it’s worse than the many PRNGs that pass it...

The citation given does not support this claim. It says that ChaCha4 does pass Practrand (but that ChaCha3 does not):

Three rounds of ChaCha are not enough [1], but four rounds are enough to satisfy DIEHARDER, PractRand, and TestU01 [2].

[2] ChaCha(4) passed TestU01 small crush and full crush. It passed PractRand using up to 512 GB.

That same result is also reported here:

https://www.pcg-random.org/posts/more-practrand-passes.html

@Lokathor
Copy link
Contributor

As the cited article says: If it's not cryptographic to begin with, why chacha of any round count at all?

@hanna-kruppe
Copy link
Contributor

The citation given does not support this claim. It says that ChaCha4 does pass Practrand (but that ChaCha3 does not):

You're right, sorry for the confusion. I might have misread the "up to 512 GB" bit in the footnote (as "... but fails with more output").

@joboet
Copy link
Member Author

joboet commented Nov 4, 2024

I've implemented the suggestions, thank you everyone!

@joshtriplett I'm thinking about renaming from_seed to with_seed, since that IMHO fits better with other constructors (e.g. with_hasher) where with is used to specify parameters. Also, from sort-of implies value-preservation, which isn't really a thing here. Any thoughts?

Co-authored-by: Josh Triplett <[email protected]>
@bors
Copy link
Collaborator

bors commented Jan 27, 2025

☔ The latest upstream changes (presumably #135937) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.