perf(plonk): optimize prover with batch inversion and direct evaluations#1143
perf(plonk): optimize prover with batch inversion and direct evaluations#1143diegokingston wants to merge 11 commits intomainfrom
Conversation
Kimi AI ReviewReview FeedbackFile:
|
| let zh_eval: Vec<_> = (0..degree).map(|i| zh_base[i % 4].clone()).collect(); | ||
| let c: Vec<_> = p_eval | ||
| .iter() | ||
| .zip(zh_eval.iter()) |
There was a problem hiding this comment.
Correctness:
-
Mathematical Operations:
- In
lplambda, ensure that operations involvingbetaandgammaare performed modulo the field order. Ensure that the field operations are sound and the modular arithmetic is respected.
- In
-
Edge Cases:
- The code relies on
betaandgammato prevent zero denominators for the permutation polynomial. Consider handling exceptional cases where these conditions might not be satisfied, especially when relying on randomness.
- The code relies on
-
Polynomial and FFT Implementations:
- The optimizations in evaluating
p_xandZ_Hdirectly via coset arithmetic are sound. However, ensure there's no rounding error or omissions in coefficients calculations.
- The optimizations in evaluating
Security:
-
Timing Side-Channels:
- The operations appear to be constant time, but ensure the division and inversion operations respect constant-time constraints.
-
Zeroization of Sensitive Data:
- There is no explicit zeroization or sensitive data management observed. Ensure that sensitive variables are properly zeroized post-use.
-
Cryptographically Secure Randomness:
- Random values like
betaandgammaneed a secure source. Ensure a cryptographically secure randomness source is used.
- Random values like
-
Secret-dependent Branching:
- The code doesn't seem to have any secret-dependent branching.
-
Hash Function Domain Separation:
- No hash functions are shown in the snippets, ensure proper domain separation is used were applicable.
Performance:
-
Unnecessary Allocations:
- Good use of pre-allocation in vectors.
-
Redundant Field Inversions:
- Efficient batch inversion utilized to prevent redundant operations; well-done.
-
MSM and FFT Efficiency:
- Recheck the assumptions for degree and root calculations to ensure no off-by-one errors.
Bugs & Errors:
-
Potential Panics or Unwraps:
expectusages could lead to panics if assumptions fail. Handle these carefully or ensure robust checks before.
-
Memory Safety Issues:
- Ensure no unsafe memory operations are performed without proper checks (not shown in provided snippet).
-
Off-by-one Errors:
- Check iterations and index usage, especially around
degreeandncalculations.
- Check iterations and index usage, especially around
-
Integer Overflow/Underflow:
- Check the field element scalar multiplication and power operations for potential overflow, especially in custom operations.
Code Simplicity:
-
Overly Complex Implementations:
- Optimization is great but may introduce complexity. Ensure thorough testing and documentation.
-
Duplicated Code:
- None noted; efficient reuse of logic observed, but ensure it's clearly documented.
-
Poor Abstractions:
- Consider abstracting pattern of batch inversions and FFT evaluations for reusability.
Conclusion:
The code appears to be moving towards performant and efficient optimization but has some concerns around robustness against unexpected cases and randomness guarantees. The reliance on expect for critical paths needs revisiting. Address these issues before merging.
Greptile OverviewGreptile SummaryThis PR optimizes the PLONK prover by (1) switching the permutation polynomial construction (round 2) from per-step divisions to a single batch inversion, and (2) avoiding FFT evaluation for the identity polynomial The main correctness-sensitive changes are in Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| crates/provers/plonk/src/prover.rs | Replaces per-step divisions with batch inversion in round 2 and replaces FFT-based evaluations for X and Z_H with direct coset generation; correctness now depends on matching evaluate_offset_fft point ordering/root selection and on degree == 4*n for the Z_H shortcut. |
Sequence Diagram
sequenceDiagram
participant P as Prover
participant FE as FieldElement
participant FFT as Polynomial FFT
Note over P: Round 2 - build permutation z
P->>P: "Compute numerators and denominators"
P->>FE: "Batch invert denominators"
FE-->>P: "Return inverses"
P->>P: "Fold coefficients for z"
P->>FFT: "Interpolate z from coefficients"
Note over P: Round 3 - quotient construction
P->>FFT: "Evaluate polynomials on coset (degree = 4n)"
FFT-->>P: "Return evaluation vectors"
P->>P: "Generate identity polynomial values on coset"
P->>P: "Compute constraint and permutation evaluations"
P->>P: "Compute vanishing polynomial inverses via 4-cycle"
P->>P: "Multiply by vanishing inverses to get quotient evals"
P->>FFT: "Interpolate quotient polynomial from evaluations"
FFT-->>P: "Return quotient polynomial split lo/mid/hi"
crates/provers/plonk/src/prover.rs
Outdated
| // Compute all numerators and denominators first | ||
| let n_minus_1 = cpi.n - 1; | ||
| let mut numerators = Vec::with_capacity(n_minus_1); | ||
| let mut denominators = Vec::with_capacity(n_minus_1); | ||
|
|
||
| for i in 0..n_minus_1 { | ||
| let (a_i, b_i, c_i) = (&witness.a[i], &witness.b[i], &witness.c[i]); | ||
| let num = lp(a_i, &cpi.domain[i]) | ||
| * lp(b_i, &(&cpi.domain[i] * &cpi.k1)) | ||
| * lp(c_i, &(&cpi.domain[i] * &k2)); | ||
| let den = lp(a_i, &s1[i]) * lp(b_i, &s2[i]) * lp(c_i, &s3[i]); | ||
| // den != 0 with overwhelming probability because beta and gamma are random elements. | ||
| let new_factor = (num / den).expect( | ||
| "division by zero in permutation polynomial: beta and gamma should prevent this", | ||
| ); | ||
|
|
||
| let new_term = coefficients | ||
| .last() | ||
| .expect("coefficients vector is non-empty") | ||
| * &new_factor; | ||
| numerators.push(num); | ||
| denominators.push(den); | ||
| } | ||
|
|
||
| // Batch invert all denominators at once (much faster than n-1 individual inversions) | ||
| FieldElement::inplace_batch_inverse(&mut denominators).expect( | ||
| "batch inversion failed in permutation polynomial: beta and gamma should prevent zeros", | ||
| ); | ||
|
|
||
| // Compute coefficients using the inverted denominators | ||
| let mut coefficients: Vec<FieldElement<F>> = Vec::with_capacity(cpi.n); | ||
| coefficients.push(FieldElement::one()); | ||
|
|
||
| for i in 0..n_minus_1 { | ||
| let factor = &numerators[i] * &denominators[i]; | ||
| let new_term = coefficients.last().expect("coefficients non-empty") * &factor; | ||
| coefficients.push(new_term); | ||
| } |
There was a problem hiding this comment.
Permutation polynomial off-by-one
round_2 now computes n_minus_1 = cpi.n - 1 and only generates coefficients of length cpi.n by pushing 1 plus n-1 terms. The original loop was for i in 0..&cpi.n - 1, which produced exactly n coefficients only if the range was interpreted as 0..(n) (due to &cpi.n - 1 precedence). With the new explicit n-1, coefficients becomes length n, but this assumes the protocol wants exactly n Lagrange evaluations and that z[n-1] is the last term. If the intended range was actually 0..(n) (common in PLONK for building z over the whole domain and then separately enforcing boundary conditions), this change silently drops one multiplication step and changes p_z.
Please double-check the intended index range and coefficient count here against the protocol and existing tests/vectors, and adjust the loop bounds accordingly.
Also appears in: crates/provers/plonk/src/prover.rs:348-355 (numerator/denominator collection uses same n_minus_1).
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/provers/plonk/src/prover.rs
Line: 343:371
Comment:
**Permutation polynomial off-by-one**
`round_2` now computes `n_minus_1 = cpi.n - 1` and only generates `coefficients` of length `cpi.n` by pushing `1` plus `n-1` terms. The original loop was `for i in 0..&cpi.n - 1`, which produced exactly `n` coefficients only if the range was interpreted as `0..(n)` (due to `&cpi.n - 1` precedence). With the new explicit `n-1`, `coefficients` becomes length `n`, but this assumes the protocol wants exactly `n` Lagrange evaluations and that `z[n-1]` is the last term. If the intended range was actually `0..(n)` (common in PLONK for building z over the whole domain and then separately enforcing boundary conditions), this change silently drops one multiplication step and changes `p_z`.
Please double-check the intended index range and coefficient count here against the protocol and existing tests/vectors, and adjust the loop bounds accordingly.
Also appears in: `crates/provers/plonk/src/prover.rs:348-355` (numerator/denominator collection uses same `n_minus_1`).
How can I resolve this? If you propose a fix, please make it concise.
crates/provers/plonk/src/prover.rs
Outdated
| // Optimization: p_x = X (identity polynomial), so p_x(offset * ω^i) = offset * ω^i. | ||
| // Generate the coset directly instead of using FFT. | ||
| let omega = F::get_primitive_root_of_unity(degree.trailing_zeros() as u64) | ||
| .expect("primitive root exists for degree"); | ||
| let p_x_eval: Vec<_> = (0..degree) | ||
| .scan(offset.clone(), |current, _| { | ||
| let val = current.clone(); | ||
| *current = &*current * ω | ||
| Some(val) | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
Coset omega mismatch
p_x_eval is now generated using omega = F::get_primitive_root_of_unity(degree.trailing_zeros() as u64), and then iterating offset * omega^i. This must match the exact ordering/roots used by Polynomial::evaluate_offset_fft(..., Some(degree), offset) for the other polynomials in this round. If evaluate_offset_fft uses a different primitive root selection (or a different omega direction / bit-reversal ordering), p_x_eval will be on a different evaluation set/order than p_a_eval, p_b_eval, etc., breaking the quotient construction.
To make this safe, derive the root/ordering from the same helper used by evaluate_offset_fft (or replace this with a call that produces the same points deterministically), rather than re-deriving omega locally.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/provers/plonk/src/prover.rs
Line: 442:452
Comment:
**Coset omega mismatch**
`p_x_eval` is now generated using `omega = F::get_primitive_root_of_unity(degree.trailing_zeros() as u64)`, and then iterating `offset * omega^i`. This must match the exact ordering/roots used by `Polynomial::evaluate_offset_fft(..., Some(degree), offset)` for the other polynomials in this round. If `evaluate_offset_fft` uses a different primitive root selection (or a different `omega` direction / bit-reversal ordering), `p_x_eval` will be on a different evaluation set/order than `p_a_eval`, `p_b_eval`, etc., breaking the quotient construction.
To make this safe, derive the root/ordering from the same helper used by `evaluate_offset_fft` (or replace this with a call that produces the same points deterministically), rather than re-deriving `omega` locally.
How can I resolve this? If you propose a fix, please make it concise.| // Optimization: Z_H(x) = x^n - 1 has only 4 distinct values on a coset of size 4n. | ||
| // On coset {offset * ω^i : i = 0..4n-1} where ω is primitive 4n-th root: | ||
| // Z_H(offset * ω^i) = offset^n * (ω^n)^i - 1 | ||
| // Since ω^n is a 4th root of unity, (ω^n)^i cycles through 4 values. | ||
| let omega_4n = F::get_primitive_root_of_unity(degree.trailing_zeros() as u64) | ||
| .expect("primitive root exists for degree"); | ||
| let omega_n = omega_4n.pow(cpi.n as u64); // ω^n where ω is 4n-th root; this is a 4th root of unity | ||
| let offset_to_n = offset.pow(cpi.n as u64); | ||
|
|
||
| // Compute the 4 distinct Z_H values and their inverses | ||
| // Use multiplication chain for small powers (faster than pow) | ||
| let omega_n_sq = &omega_n * &omega_n; | ||
| let omega_n_cubed = &omega_n_sq * &omega_n; | ||
| let mut zh_base = [ | ||
| &offset_to_n - FieldElement::<F>::one(), // i ≡ 0 (mod 4) | ||
| &offset_to_n * &omega_n - FieldElement::<F>::one(), // i ≡ 1 (mod 4) | ||
| &offset_to_n * &omega_n_sq - FieldElement::<F>::one(), // i ≡ 2 (mod 4) | ||
| &offset_to_n * &omega_n_cubed - FieldElement::<F>::one(), // i ≡ 3 (mod 4) | ||
| ]; | ||
| FieldElement::inplace_batch_inverse(&mut zh_base) | ||
| .expect("Z_H evaluations are non-zero on coset offset from roots of unity"); | ||
|
|
||
| // Build full evaluation vector by cycling through the 4 values | ||
| let zh_eval: Vec<_> = (0..degree).map(|i| zh_base[i % 4].clone()).collect(); | ||
| let c: Vec<_> = p_eval |
There was a problem hiding this comment.
Z_H shortcut assumes degree=4n
The new Z_H evaluation shortcut relies on degree = 4 * cpi.n so that omega_4n is a primitive 4n-th root and omega_n = omega_4n.pow(n) is a 4th root of unity. This is only valid under that exact relationship and when degree is exactly 4n in the FFT calls. If degree changes (the code even has a TODO about “factor of 4”), or if evaluate_offset_fft internally uses a different root/order, then zh_eval will be wrong and the quotient t will be computed incorrectly.
At minimum, add an assertion that degree == 4 * cpi.n here (and/or compute omega_n from the actual evaluation root used by the FFT helper), so the optimization can’t silently produce invalid proofs when degree is adjusted.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/provers/plonk/src/prover.rs
Line: 526:550
Comment:
**Z_H shortcut assumes degree=4n**
The new `Z_H` evaluation shortcut relies on `degree = 4 * cpi.n` so that `omega_4n` is a primitive `4n`-th root and `omega_n = omega_4n.pow(n)` is a 4th root of unity. This is only valid under that exact relationship and when `degree` is exactly `4n` in the FFT calls. If `degree` changes (the code even has a TODO about “factor of 4”), or if `evaluate_offset_fft` internally uses a different root/order, then `zh_eval` will be wrong and the quotient `t` will be computed incorrectly.
At minimum, add an assertion that `degree == 4 * cpi.n` here (and/or compute `omega_n` from the actual evaluation root used by the FFT helper), so the optimization can’t silently produce invalid proofs when `degree` is adjusted.
How can I resolve this? If you propose a fix, please make it concise.| for constraint in &self.constraints { | ||
| let a = assignments.get(&constraint.l); | ||
| let b = assignments.get(&constraint.r); | ||
| let c = assignments.get(&constraint.o); |
There was a problem hiding this comment.
Correctness
- The changes involve iteration over constraints with assignments. Ensure that
solve_hintandsolve_constraintare correctly transforming the assignments, maintaining invariant properties for each constraint. - Consider edge cases where assignments may initially be empty or when constraints have zero or identity elements which might produce unexpected results.
Security
- Verify if
solve_hintandsolve_constraintoperate in constant-time, especially if handling secret data to prevent timing side-channel attacks. - Check if sensitive data within
assignmentsrequires zeroization after processing.
Performance
- Iterating over all constraints for each assignment might be inefficient depending on the size of
self.constraints. Consider breaking out early if possible on each iteration after an assignment changes.
Bugs & Errors
- Confirm that
Variableidentifiers are correctly handled and that any possible conversion or retrieval of hash map values does not panic (e.g., due to missing constraints or invalid keys).
Code Simplicity
- Current loop structure with
assignments = solve_hint(assignments, constraint);andassignments = solve_constraint(assignments, constraint);may introduce redundancy. If possible, merging these processes could simplify the implementation, reducing cognitive load.
Overall Comment
Ensure thorough testing for edge cases and data-related issues. Verify if a more efficient constraint solving methodology can be implemented to enhance performance without compromising correctness and security.
Kimi AI ReviewReview FeedbackFile:
|
Address reviewer concerns from PR #1143: - Add debug_assert verifying p_x_eval length matches FFT evaluation length - Add debug_assert enforcing degree == 4n assumption for Z_H optimization - Add clarifying comment explaining permutation polynomial index range is correct - Add safety comments documenting the relationship between manual coset generation and FFT's primitive root derivation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| // Compute coefficients using the inverted denominators | ||
| let mut coefficients: Vec<FieldElement<F>> = Vec::with_capacity(cpi.n); | ||
| coefficients.push(FieldElement::one()); | ||
|
|
There was a problem hiding this comment.
Correctness
- Primitive Root Assumptions: You correctly assert the existence of a primitive root using
expect, but ensure that the assumptions about the degree calculations hold under all cases, especially related to trailing zeros and field limits. - Edge Cases: Ensure that assumptions about
degree == 4 * nare valid for all input scenarios beyond just the constraints in the comments. Changes in input domain characteristics could lead to edge cases being improperly handled.
Security
- Timing Side-Channels: Code involving field elements often needs to be constant time. Ensure that additions using
+and multiplications with*on field elements are constant time to avoid leaks. - SAFETY Assertions: Assertions are used for optimization checks, but consider using them in a way that does not leak information in a release build, or ensure they compile out entirely, depending how critical they could be as attack vectors.
Performance
- Unnecessary Allocations: The direct use of
Vec::with_capacity(n_minus_1)can be justified if filled without extraneous allocations. After ensuring correctness, look for other parts of the code that might perform unnecessary allocations tightly connected with mathematical operations.
Bugs & Errors
- Panics and Debug Assertions: Although debug assertions are useful, they should not reach a production code path that could panic under an invalid constraint. Consider a strategy to gracefully handle these conditions.
- Potential Panics with
expect: While you useexpectin multiple places, ensure that the message clearly describes the failure context and, if possible, check those conditions upfront to avoid panics.
Code Simplicity
- Complex Assertions: Perhaps consider encapsulating complex checks like the degree match into helper functions, potentially improving code readability and reuse.
Overall, assumptions in mathematical operations need further scrutiny to ensure that the constraints and debug assertions align with real-world and adversarial inputs. Consider potential refactorings or performance enhancements after addressing foundational issues.
Kimi AI ReviewHere are my specific comments on the provided PR diff:
Overall, these changes look good. They improve performance by reducing unnecessary allocations, using more efficient algorithms, and avoiding redundant computations. The mathematical correctness and cryptographic security aspects seem unchanged. The code is also simplified and more maintainable. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1143 +/- ##
==========================================
+ Coverage 73.18% 73.33% +0.14%
==========================================
Files 176 177 +1
Lines 39491 39711 +220
==========================================
+ Hits 28903 29123 +220
Misses 10588 10588 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Round 2: Replace n-1 individual divisions with batch inversion (reduces field inversions from O(n) to O(1) via Montgomery's trick) - Round 3: Skip FFT for vanishing polynomial Z_H = x^n - 1 (Z_H has only 4 distinct values on a 4n coset; compute directly) - Round 3: Skip FFT for identity polynomial p_x = X (generate coset values directly as offset * ω^i) - Use multiplication chains instead of pow() for small exponents
Prover Round 5: - Compute zeta^n once, derive zeta^(n+2) and zeta^(2n+4) via multiplications - Precompute alpha^2 instead of computing it inline Verifier: - Same zeta power optimization (eliminates 2 redundant exponentiations) - Precompute alpha^2 for reuse - Reuse zh_zeta (which equals zeta^n - 1) for l1_zeta computation - Simplify partial_t_1 computation with precomputed powers
- Replace pow() in test_srs and identity_permutation with iterative multiplication - Use scan instead of fold with unwrap in generate_domain - Simplify generate_permutation_coefficients using iterator - Simplify public_input_values using filter_map - Fix redundant alpha*alpha computation in verifier (use precomputed alpha_squared)
- Use HashMap::len() instead of HashMap::keys().len() - Use reference iteration (&self.constraints) instead of .iter()
Address reviewer concerns from PR #1143: - Add debug_assert verifying p_x_eval length matches FFT evaluation length - Add debug_assert enforcing degree == 4n assumption for Z_H optimization - Add clarifying comment explaining permutation polynomial index range is correct - Add safety comments documenting the relationship between manual coset generation and FFT's primitive root derivation
5d88a65 to
5d17814
Compare
| let p_pi_eval = Polynomial::evaluate_offset_fft(&p_pi, 1, Some(degree), offset) | ||
| .expect("FFT evaluation of p_pi must be within field's two-adicity limit"); | ||
| let p_x_eval = Polynomial::evaluate_offset_fft(p_x, 1, Some(degree), offset) | ||
| .expect("FFT evaluation of p_x must be within field's two-adicity limit"); |
There was a problem hiding this comment.
Correctness
- The code makes several assumptions about the existence of primitive roots which are verified using
expect. While the debug assertions help ensure these assumptions hold, they can lead to panics in release builds if the conditions change. Consider replacing these with proper error handling or ensuring that the assumptions are always validated. - The safety comment regarding the assumption
degree == 4 * nshould be clearly documented and verified elsewhere to ensure this invariant is maintained across future code changes.
Security
- There are no measures mentioned regarding cryptographically secure randomness, which may be important depending on where and how this code is used.
- There is no explicit mention of constant-time operations or handling of secret values without timing side-channels. Ensure that sensitive operations do not introduce timing attacks.
- The provided code does not indicate handling of zeroization of sensitive data.
Performance
- The use of debug assertions is a good practice, but also consider adding release-build checks or optimizations that do not rely solely on debug-build diagnostics.
- Ensure that unnecessary allocations are minimized especially in high-demand functions such as FFT evaluations.
Bugs & Errors
- The use of
expectstatements can lead to panics which should be avoided especially in a cryptographic library. - Relying on panic behavior is risky; consider using a more robust error handling mechanism.
Code Simplicity
- The explanations within the comments are informative and help in understanding the logic of the implementation. Be mindful that these assumptions and optimizations are correctly documented in user-facing documentation as well.
Consider addressing the above issues before merging to ensure robustness and security of the cryptographic operations.
Kimi AI ReviewHere are my specific feedback and comments on the provided PR diff:
Overall, the changes in this PR are focused on performance optimizations and code simplification. The optimizations are well-implemented and should significantly improve the performance of the PLONK prover and verifier. The code changes are also more idiomatic Rust, which makes the code easier to maintain and understand. Good job on the refactor! One minor suggestion: Consider adding comments to explain the optimizations, especially for the more complex ones involving polynomial evaluations and field element computations. This will help other developers understand the rationale behind the changes. |
|
@greptile |
Kimi AI ReviewHere are my specific comments on the PR diff:
Overall, the changes seem to improve performance and code conciseness while maintaining correctness. However, some TODO comments should be addressed with more detailed explanations. Adding comments to explain the optimizations and assumptions would also improve code readability and maintainability. |
Greptile OverviewGreptile Summary
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| crates/provers/plonk/src/constraint_system/mod.rs | Minor refactor: public input values now collected via iterator/filter_map; no behavior change expected. |
| crates/provers/plonk/src/constraint_system/solver.rs | Minor loop refactor using assignments.len() and iterating by reference; no functional change. |
| crates/provers/plonk/src/prover.rs | Adds batch inversion in round 2, direct coset evaluations for p_x and Z_H, and small exponent mult chains; found a release-mode safety issue (debug_assert on degree==4n). |
| crates/provers/plonk/src/test_utils/utils.rs | Optimizes test SRS/domain/permutation generation with iterative multiplication; found a behavioral change where permutation slice is silently truncated, masking length requirements and risking panics depending on contents. |
| crates/provers/plonk/src/verifier.rs | Precomputes zeta powers and alpha^2 to avoid repeated exponentiation; no functional change detected. |
Sequence Diagram
sequenceDiagram
autonumber
participant Prover
participant Poly as Polynomial/FFT
participant Field as FieldElement
participant CS as CommitmentScheme
participant Verifier
Prover->>Prover: Round 2 - build permutation factors
Prover->>Field: collect denominators[0..n-2]
Prover->>Field: inplace_batch_inverse(denominators)
Prover->>Prover: accumulate z coefficients using num[i]*den_inv[i]
Prover->>Poly: Round 3 - evaluate witness polys on coset via evaluate_offset_fft
Prover->>Field: generate p_x_eval directly: offset * ω^i
Prover->>Field: compute zh_eval directly (4-cycle) and batch invert 4 values
Prover->>Poly: interpolate_offset_fft(c = p_eval * zh_eval)
Prover->>CS: commit to quotient/linearization polys
Verifier->>Verifier: compute challenges (β, γ, α, ζ, υ)
Verifier->>Field: precompute ζ^n, ζ^(n+2), ζ^(2n+4), α^2
Verifier->>Verifier: check constraints: p(ζ) == Z_H(ζ) * t(ζ)
Verifier->>CS: verify_batch openings at ζ
Verifier->>CS: verify single opening at ζω
Verifier-->>Verifier: accept/reject
| debug_assert_eq!( | ||
| degree, | ||
| 4 * cpi.n, | ||
| "Z_H optimization requires degree == 4n; if degree formula changes, update this code" | ||
| ); |
There was a problem hiding this comment.
Debug-only assumption check
degree == 4 * cpi.n is only enforced with debug_assert_eq!, so in release builds this optimization can silently produce incorrect Z_H evaluations if degree ever changes (the surrounding code even has a TODO about the factor of 4). This should be a real assert_eq! (or a checked branch that falls back to the FFT path) so invalid proofs can’t be produced in optimized builds.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/provers/plonk/src/prover.rs
Line: 541:545
Comment:
**Debug-only assumption check**
`degree == 4 * cpi.n` is only enforced with `debug_assert_eq!`, so in release builds this optimization can silently produce incorrect `Z_H` evaluations if `degree` ever changes (the surrounding code even has a TODO about the factor of 4). This should be a real `assert_eq!` (or a checked branch that falls back to the FFT path) so invalid proofs can’t be produced in optimized builds.
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
If the function requires Prompt To Fix With AIThis is a comment left during a code review.
Path: crates/provers/plonk/src/test_utils/utils.rs
Line: 63:75
Comment:
**Out-of-bounds on short permutation**
`generate_permutation_coefficients` now does `permutation.iter().take(n * 3).map(|&i| identity[i].clone())`. If `permutation.len() < 3n`, the `.take(3n)` still iterates `permutation.len()` elements, but any element value `i` outside `0..identity.len()` will panic with an out-of-bounds index. The previous code indexed `permutation[i]` for `i in 0..3n`, which would also panic when `permutation.len() < 3n`, but it *guaranteed* the input length requirement; the new code makes it easier to accidentally pass a shorter slice and get a panic that depends on the slice contents.
If the function requires `permutation.len() >= 3n` and all entries `< 3n`, it should enforce that (e.g., `assert_eq!(permutation.len(), 3*n)` / bounds-check) rather than silently truncating.
How can I resolve this? If you propose a fix, please make it concise. |
Kimi AI ReviewReview Feedback
|
Benchmark ResultsEnvironment 1:Apple M1 8 cores (8 threads - no hyperthreading)
Environment 2:AMD Ryzen 5 PRO 3600 (6-Core, 12 threads)
|
Kimi AI ReviewReview FeedbackFile:
|
Kimi AI ReviewReview FeedbackFile:
|
Kimi AI ReviewHere are my reviews of the provided PR diff:
|
|
|
||
| [[bench]] | ||
| name = "criterion_prover" | ||
| harness = false |
There was a problem hiding this comment.
Correctness
- Mathematical Operations: Please ensure that all mathematical operations, especially modular arithmetic and field operations, handle edge cases correctly, like zero and identity elements.
- Polynomial and FFT implementations: Check for any off-by-one errors or incorrect assumptions about polynomial degrees and FFT lengths.
Security
- Constant Time Operations: Ensure that all cryptographic operations on secret data, including elliptic curve operations, are constant-time to avoid timing side-channels.
- Randomness: Verify that the randomness sourced for cryptographic operations is cryptographically secure.
- Hash Domain Separation: Ensure that all hash functions used in different contexts are domain-separated to prevent collision attacks across distinct domains.
Performance
- Inversions and Allocations: Examine the code for any unnecessary field inversions or allocations, particularly in critical paths like MSM and FFT.
Bugs & Errors
- Panic and Unwraps: Look for potential panic scenarios or the use of
unwrap()that could lead to runtime crashes if assumptions are violated. - Integer Overflow: Ensure that all arithmetic operations are checked to prevent integer overflow/underflow.
Code Simplicity
- Complex Implementations: Review the code for complex logic that can be broken down into simpler functions or better abstractions.
- Duplicated Code: Ensure there is no duplicated logic across the codebase that can be abstracted into shared functions.
| } | ||
|
|
||
| criterion_group!(benches, bench_plonk_prover); | ||
| criterion_main!(benches); |
There was a problem hiding this comment.
Correctness
- Mathematical operations: No issues are evident in the code snippet regarding modular or field arithmetic directly, as operations seem to rely on library calls which presumably handle these safely. However, ensure that
FieldElement::fromand multiplication (&x * &e) wrap correctly under the modulus. - Edge cases: It’s unclear how
test_witness_sizeandtest_common_preprocessed_input_sizehandle edge values like zero, which could be significant if they lead to zero-constructed polynomials or zero division during setup.
Security
- Timing side-channels: Ensure functions such as
Kzg::newandprover.prove(...)are constant-time, especially operations within them that handle private data. - Zeroization: No explicit zeroization is present for sensitive data. This might be outside the scope of the benchmark, but sensitive cryptographic material should be zeroized after use.
- Randomness: The
TestRandomFieldGeneratorshould clearly and securely initialize to prevent predictable randomness during proofs. - Hash functions: Verify that any hash function usage respects domain separation, even though it’s not shown here.
Performance
- Allocations: No unnecessary allocation is evident in this snippet beyond creating vectors, which appear intentional.
- Redundant operations: Invocations like
FieldElement::from(4_u64)can be pre-computed if possible outside loops for efficiency. - FFT efficiency: Not directly applicable here, but ensure FFTs in
setupor proofs are optimized for sizes used (e.g., power of 2 optimizations).
Bugs & Errors
- Panics/unwraps: No explicit panics or unwraps appear in the code. Verify any underlying library functions also handle input safely without panic.
- Memory safety: Rust's borrow checker typically ensures memory safety, but allocations with
vec![x, y]imply heap usage without apparent risk. - Integer overflow:
FieldElement::fromshould inherently handle overflow safely if used with field moduli constraints. - Off-by-one errors: Nothing immediately evident as off-by-one, though ensure looping constructs in
prover.prove()accommodate index ranges correctly.
Code Simplicity
- Complexity: The implementation seems straightforward for benchmarking.
- Duplicated code: None observed.
- Abstractions: Usage of the Criterion library aids abstraction, but ensure any cryptographic process like
Kzg::new()isn’t overly complicated or misused. Consider ifTestRandomFieldGeneratorcould inherit from a more general secure random generator, reducing test complexity.
Overall, this code snippet is focused on benchmarking and setup, but more scrutiny is needed on functions like setup() and prover.prove(...) that are not shown but are critical for correctness and security. The code should be more robust regarding potential side-channel attacks and sensitive data handling for production use. Ensure to thoroughly test elliptic curve operations, setups, and snarks within those unshown methods.
| "p_x_eval length must match FFT evaluation length" | ||
| ); | ||
|
|
||
| let p_constraints_eval: Vec<_> = p_a_eval |
There was a problem hiding this comment.
Correctness:
- Line 580: Ensure that polynomial blinding and commitment scheme operations follow correct modular arithmetic over the field, especially with respect to the degree and offset.
- FFT Evaluations: The code seems to compress multiple polynomial evaluations which are expected; ensure all polynomials used in FFT are correct and the evaluations are accurately accounted for.
Security:
- Constant-time Operations: The use of parallel execution can introduce side channels due to non-constant-time operations. Ensure the operations involving secrets are constant-time to prevent timing attacks.
Performance:
- Parallel Execution: The implementation attempts to parallelize FFT and commitment operations, which is good for performance, but the effectiveness needs to be measured to ensure this overhead doesn't outweigh the performance gains.
Bugs & Errors:
- Error Handling: The
expectstatements used in FFT operations imply potential panics if preconditions are not met. Consider alternative error handling to prevent abrupt failures.
Code Simplicity:
- Parallelization Complexity: While the parallel implementation can improve performance, it increases code complexity. Ensure that the added complexity results in measurable performance gains without introducing errors.
Additional Considerations:
- Zeroization: Ensure any sensitive data like secrets in fields or during commitments are zeroized from memory post-operation to prevent leakage.
- Hash Functions: If hash functions are used, ensure proper domain separation to avoid hash collisions or application errors. Ensure any random field generation uses cryptographically secure randomness.
Overall, the changes introduce potential performance improvements but raise concerns about security and complexity that need to be addressed before merging.
| let verifier = Verifier::new(kzg); | ||
| assert!(verifier.verify(&proof, &public_input, &common, &vk)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Correctness
- The function
test_common_preprocessed_input_sizeusesunwraponFrField::get_primitive_root_of_unity(log_n). This can cause a panic if the root does not exist, but given that all inputs are powers of two and the field specifics are correct, this is likely okay, but verifying this through tests or additional checks would enhance robustness. - Similarly,
Polynomial::interpolate_fft::<FrField>(&ql_vals).unwrap()and others could panic if interpolation fails. Consider handling error gracefully.
Security
- Timing Side-Channels: Ensure mathematical operations on
FieldElementand involving secret data are constant-time to prevent side-channel attacks. This is not verifiable through a static code review without further context or implementation details. - Cryptographically Secure Randomness: The usage of
TestRandomFieldGeneratoras a randomness source should be scrutinized to ensure it's replaced with a cryptographically secure generator in production. - Proper Zeroization: No indication of explicit zeroization of sensitive data before they're deallocated.
Performance
- The code uses
vec![0; n * 3]to create thepermutationvector, which may cause unnecessary memory allocation since only a subset of indices are modified. Consider usingwith_capacityto preallocate memory without initialization when only parts of the vector are initialized explicitly.
Bugs & Errors
- The code ensures that
nis at least 4 and a power of two. However, if any validations change in the future, ensure tests are retained to verify this.
Code Simplicity
- Duplicated Code: The unit test sequence for different
nvalues is repetitive. Consider using parameterized tests to avoid code duplication.
Overall, while the code logic seems generally sound, the areas noted with unwraps, unnecessary allocations, and potential branching could be refined or reconsidered to handle error cases more gracefully and improve security features in a production environment.
Consider addressing these issues before merging.
|
@greptile |
Greptile OverviewGreptile SummaryThis PR implements significant performance optimizations to the PLONK prover by reducing cryptographic field operations: Key Optimizations:
Testing & Infrastructure:
The optimizations are mathematically sound and leverage domain-specific properties of the PLONK protocol. However, previous review threads have identified critical assumptions (particularly Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| crates/provers/plonk/src/prover.rs | Major performance optimizations: batch inversion in round_2, direct Z_H/p_x evaluation in round_3, parallel FFT evaluations, and multiplication chains for small exponents. Previous threads identified critical assumptions about degree==4n that need verification. |
| crates/provers/plonk/src/verifier.rs | Mirrors prover optimizations: precomputes zeta powers efficiently using multiplication chains instead of repeated pow() calls, and reuses alpha_squared. Straightforward performance improvements with no correctness concerns. |
| crates/provers/plonk/Cargo.toml | Adds optional rayon dependency for parallelization, criterion for benchmarking, and defines the parallel feature flag. All changes are appropriate for the performance optimization work. |
| crates/provers/plonk/src/test_utils/circuit_large.rs | New test utilities for generating large circuits by padding circuit_1 with identity gates. Includes comprehensive tests at multiple sizes (n=16, 64, 256, 512, 4096). Implementation is clean and well-tested. |
Sequence Diagram
sequenceDiagram
participant Prover
participant Round1
participant Round2
participant Round3
participant Round4
participant Round5
Prover->>Round1: Commit wire polynomials (a, b, c)
Note over Round1: Parallel commitments (3x)
Round1-->>Prover: a_1, b_1, c_1
Prover->>Round2: Sample β, γ from transcript
Prover->>Round2: Build permutation polynomial z
Note over Round2: Batch inversion (n-1 denominators)
Round2-->>Prover: z_1
Prover->>Round3: Sample α from transcript
Prover->>Round3: Compute quotient polynomial t
Note over Round3: Parallel FFT evaluations (15 polys)<br/>Direct Z_H computation (4 values)<br/>Direct p_x generation (no FFT)
Round3-->>Prover: t_lo_1, t_mid_1, t_hi_1
Prover->>Round4: Sample ζ from transcript
Prover->>Round4: Evaluate polynomials at ζ
Note over Round4: Multiplication chains for zeta powers
Round4-->>Prover: a_ζ, b_ζ, c_ζ, s1_ζ, s2_ζ, z_ζω
Prover->>Round5: Sample υ from transcript
Prover->>Round5: Generate opening proofs
Note over Round5: Parallel batch opening
Round5-->>Prover: w_ζ_1, w_ζω_1
|
Updated benchmarks results Apple M1 8 cores with
AMD Ryzen 5 PRO 3600 (6-Core, 12 threads) with
|
Kimi AI ReviewHere are my specific comments on the PR:
Overall, the PR makes several optimizations for performance while maintaining correctness. The parallelism and batch operations could provide significant speedups. However, ensure thorough testing, especially for the parallelized code. The changes seem well-reviewed and should improve performance. |
Round 2: Replace n-1 individual divisions with batch inversion (reduces field inversions from O(n) to O(1) via Montgomery's trick)
Round 3: Skip FFT for vanishing polynomial Z_H = x^n - 1 (Z_H has only 4 distinct values on a 4n coset; compute directly)
Round 3: Skip FFT for identity polynomial p_x = X (generate coset values directly as offset * ω^i)
Use multiplication chains instead of pow() for small exponents