You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to figure out massive slowdowns `crypto-primes` experiences for
boxed uints (up to 4x). Could be the reason of the slowdowns in `RSA` as
well.
Public changes:
- Added `Monty::div_by_2_assign()` (with a blanket impl).
- Added `BoxedUint::inv_mod2k_vartime()`.
- Made `BoxedUint::inv_mod2k()` public.
- Added `Monty::Multiplier` associated type and
`Monty::copy_montgomery_from()` to assist with tight loops
(specifically, Lucas test in `crypto-primes`).
- Cleaned up AMM, added comments and references, and reduced the size of
the internal buffer to N from 2N. Also made it `const fn`. Closes#782
**Note:** the multiplier for `Uint` is called `DynMontyMultiplier`. Not
happy with the name, but we already have `MontyMultiplier` as a trait,
and it clashes.
**Note:** the exact way MontyMultiplier is exposed and the naming I'm
not sure about, also not sure how hazmat do we want to make them.
Potentially AMM can be exposed too, but it would be good to wrap the
results in some struct that will propagate the "reduction level". Not
for this PR, I need to finalize the minimum viable solution.
Fixes:
- Fixed a bug in `BoxedUnsatInt::to_uint()` which created a 64-bit
number instead of a 32-bit one on 32-bit targets
Internal:
- Added tests for `BoxedUint::inv_mod2k()` and `inv_mod2k_vartime()`.
- Removed allocations inside the loop in `BoxedUint::inv_mod2k()`.
- Used and `inv_mod2k_vartime()` in `BoxedMontyParams::new_vartime()`
and `new()` - since it's only vartime in the `k`, which is fixed.
- `new_vartime()` can be made even faster (~15% for Uint, 25% for Boxed)
if we make a variant of `inv_mod2k` that is vartime in both arguments.
Currently added in the commit as `inv_mod2k_full_vartime()`
(crate-private). **Can be removed if that's too much detail.**
- Removed an unnecessary allocation in Add/SubAssign of
`BoxedMontgomeryForm`.
Performance notes:
- `BoxedUint::div_by_2()` uses `div_by_2_assign()` because it is faster
and does not allocate.
- `Uint::div_by_2()` uses the same approach, gets rid of one addition
and one `shr1()`, so it is marginally faster (~10%).
- As expected, because of `inv_mod2k_vartime()` usage
`MontyParams::new/_vartime()` became massively faster (~10x for Uint,
~15x for Boxed, 4096 bits).
- I tried using AMM in `Uint`, but it leads to performance degradation
for smaller uints (U256). So for now we'll keep the status quo with
`Uint` using multiply + reduce. Worth investigating later.
0 commit comments