Skip to content

perf(plonk): optimize prover with batch inversion and direct evaluations#1143

Open
diegokingston wants to merge 11 commits intomainfrom
perf/plonk-optimizations
Open

perf(plonk): optimize prover with batch inversion and direct evaluations#1143
diegokingston wants to merge 11 commits intomainfrom
perf/plonk-optimizations

Conversation

@diegokingston
Copy link
Collaborator

  • Round 2: Replace n-1 individual divisions with batch inversion (reduces field inversions from O(n) to O(1) via Montgomery's trick)

  • Round 3: Skip FFT for vanishing polynomial Z_H = x^n - 1 (Z_H has only 4 distinct values on a 4n coset; compute directly)

  • Round 3: Skip FFT for identity polynomial p_x = X (generate coset values directly as offset * ω^i)

  • Use multiplication chains instead of pow() for small exponents

@diegokingston diegokingston requested a review from a team as a code owner February 5, 2026 21:20
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Kimi AI Review

Review Feedback

File: crates/provers/plonk/src/prover.rs

Line 334-334
  • Issue: The original code snippet has a potential performance issue with individual inversions of denominators, which can be optimized by using batch inversion.
  • Suggestion: The updated code correctly uses batch inversion to improve performance. This is a good optimization.
Line 386-397
  • Issue: The original code snippet includes unnecessary polynomial evaluations and allocations.
  • Suggestion: The updated code correctly optimizes the polynomial evaluation by generating the coset directly, which is more efficient. This is a good optimization.
Line 430-438
  • Issue: The original code snippet includes unnecessary polynomial evaluations and allocations.
  • Suggestion: The updated code correctly optimizes the polynomial evaluation by generating the coset directly, which is more efficient. This is a good optimization.
Line 505-523
  • Issue: The original code snippet includes unnecessary polynomial evaluations and allocations.
  • Suggestion: The updated code correctly optimizes the polynomial evaluation by generating the coset directly, which is more efficient. This is a good optimization.

General Feedback

The changes in the PR diff show significant improvements in terms of performance optimization. The use of batch inversion and direct polynomial evaluation on cosets reduces the computational complexity and memory usage. These optimizations are well-implemented and should have a positive impact on the overall performance of the prover.

One minor suggestion is to add comments explaining the optimizations, especially for the batch inversion and direct polynomial evaluation. This will help maintainers and other developers understand the rationale behind these changes.

Overall, the changes are well-implemented and should be merged.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review by ChatGPT

let zh_eval: Vec<_> = (0..degree).map(|i| zh_base[i % 4].clone()).collect();
let c: Vec<_> = p_eval
.iter()
.zip(zh_eval.iter())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness:

  1. Mathematical Operations:

    • In lp lambda, ensure that operations involving beta and gamma are performed modulo the field order. Ensure that the field operations are sound and the modular arithmetic is respected.
  2. Edge Cases:

    • The code relies on beta and gamma to prevent zero denominators for the permutation polynomial. Consider handling exceptional cases where these conditions might not be satisfied, especially when relying on randomness.
  3. Polynomial and FFT Implementations:

    • The optimizations in evaluating p_x and Z_H directly via coset arithmetic are sound. However, ensure there's no rounding error or omissions in coefficients calculations.

Security:

  1. Timing Side-Channels:

    • The operations appear to be constant time, but ensure the division and inversion operations respect constant-time constraints.
  2. Zeroization of Sensitive Data:

    • There is no explicit zeroization or sensitive data management observed. Ensure that sensitive variables are properly zeroized post-use.
  3. Cryptographically Secure Randomness:

    • Random values like beta and gamma need a secure source. Ensure a cryptographically secure randomness source is used.
  4. Secret-dependent Branching:

    • The code doesn't seem to have any secret-dependent branching.
  5. Hash Function Domain Separation:

    • No hash functions are shown in the snippets, ensure proper domain separation is used were applicable.

Performance:

  1. Unnecessary Allocations:

    • Good use of pre-allocation in vectors.
  2. Redundant Field Inversions:

    • Efficient batch inversion utilized to prevent redundant operations; well-done.
  3. MSM and FFT Efficiency:

    • Recheck the assumptions for degree and root calculations to ensure no off-by-one errors.

Bugs & Errors:

  1. Potential Panics or Unwraps:

    • expect usages could lead to panics if assumptions fail. Handle these carefully or ensure robust checks before.
  2. Memory Safety Issues:

    • Ensure no unsafe memory operations are performed without proper checks (not shown in provided snippet).
  3. Off-by-one Errors:

    • Check iterations and index usage, especially around degree and n calculations.
  4. Integer Overflow/Underflow:

    • Check the field element scalar multiplication and power operations for potential overflow, especially in custom operations.

Code Simplicity:

  1. Overly Complex Implementations:

    • Optimization is great but may introduce complexity. Ensure thorough testing and documentation.
  2. Duplicated Code:

    • None noted; efficient reuse of logic observed, but ensure it's clearly documented.
  3. Poor Abstractions:

    • Consider abstracting pattern of batch inversions and FFT evaluations for reusability.

Conclusion:

The code appears to be moving towards performant and efficient optimization but has some concerns around robustness against unexpected cases and randomness guarantees. The reliance on expect for critical paths needs revisiting. Address these issues before merging.

@greptile-apps
Copy link

greptile-apps bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

This PR optimizes the PLONK prover by (1) switching the permutation polynomial construction (round 2) from per-step divisions to a single batch inversion, and (2) avoiding FFT evaluation for the identity polynomial X and the vanishing polynomial Z_H(x)=x^n-1 in round 3 by generating coset values directly.

The main correctness-sensitive changes are in crates/provers/plonk/src/prover.rs, where round-3 quotient construction now depends on the manually-generated coset ordering/root matching the ordering used by Polynomial::evaluate_offset_fft, and the Z_H shortcut assumes degree is exactly 4*n (as currently set) so that Z_H cycles through four values on the 4n coset.

Confidence Score: 2/5

  • This PR is not safe to merge until the manual coset evaluation paths are proven to match the FFT evaluation ordering and the Z_H shortcut assumptions are enforced.
  • Round 3 now mixes FFT-produced evaluation vectors with manually-generated vectors (for X and Z_H). If the primitive root selection or ordering differs from evaluate_offset_fft, quotient construction will be incorrect and proofs will fail verification. There is also an implicit assumption that degree == 4*n for the 4-cycle Z_H optimization, but the code contains a TODO suggesting degree may change later.
  • crates/provers/plonk/src/prover.rs

Important Files Changed

Filename Overview
crates/provers/plonk/src/prover.rs Replaces per-step divisions with batch inversion in round 2 and replaces FFT-based evaluations for X and Z_H with direct coset generation; correctness now depends on matching evaluate_offset_fft point ordering/root selection and on degree == 4*n for the Z_H shortcut.

Sequence Diagram

sequenceDiagram
    participant P as Prover
    participant FE as FieldElement
    participant FFT as Polynomial FFT

    Note over P: Round 2 - build permutation z
    P->>P: "Compute numerators and denominators"
    P->>FE: "Batch invert denominators"
    FE-->>P: "Return inverses"
    P->>P: "Fold coefficients for z"
    P->>FFT: "Interpolate z from coefficients"

    Note over P: Round 3 - quotient construction
    P->>FFT: "Evaluate polynomials on coset (degree = 4n)"
    FFT-->>P: "Return evaluation vectors"
    P->>P: "Generate identity polynomial values on coset"
    P->>P: "Compute constraint and permutation evaluations"
    P->>P: "Compute vanishing polynomial inverses via 4-cycle"
    P->>P: "Multiply by vanishing inverses to get quotient evals"
    P->>FFT: "Interpolate quotient polynomial from evaluations"
    FFT-->>P: "Return quotient polynomial split lo/mid/hi"
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 343 to 371
// Compute all numerators and denominators first
let n_minus_1 = cpi.n - 1;
let mut numerators = Vec::with_capacity(n_minus_1);
let mut denominators = Vec::with_capacity(n_minus_1);

for i in 0..n_minus_1 {
let (a_i, b_i, c_i) = (&witness.a[i], &witness.b[i], &witness.c[i]);
let num = lp(a_i, &cpi.domain[i])
* lp(b_i, &(&cpi.domain[i] * &cpi.k1))
* lp(c_i, &(&cpi.domain[i] * &k2));
let den = lp(a_i, &s1[i]) * lp(b_i, &s2[i]) * lp(c_i, &s3[i]);
// den != 0 with overwhelming probability because beta and gamma are random elements.
let new_factor = (num / den).expect(
"division by zero in permutation polynomial: beta and gamma should prevent this",
);

let new_term = coefficients
.last()
.expect("coefficients vector is non-empty")
* &new_factor;
numerators.push(num);
denominators.push(den);
}

// Batch invert all denominators at once (much faster than n-1 individual inversions)
FieldElement::inplace_batch_inverse(&mut denominators).expect(
"batch inversion failed in permutation polynomial: beta and gamma should prevent zeros",
);

// Compute coefficients using the inverted denominators
let mut coefficients: Vec<FieldElement<F>> = Vec::with_capacity(cpi.n);
coefficients.push(FieldElement::one());

for i in 0..n_minus_1 {
let factor = &numerators[i] * &denominators[i];
let new_term = coefficients.last().expect("coefficients non-empty") * &factor;
coefficients.push(new_term);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Permutation polynomial off-by-one

round_2 now computes n_minus_1 = cpi.n - 1 and only generates coefficients of length cpi.n by pushing 1 plus n-1 terms. The original loop was for i in 0..&cpi.n - 1, which produced exactly n coefficients only if the range was interpreted as 0..(n) (due to &cpi.n - 1 precedence). With the new explicit n-1, coefficients becomes length n, but this assumes the protocol wants exactly n Lagrange evaluations and that z[n-1] is the last term. If the intended range was actually 0..(n) (common in PLONK for building z over the whole domain and then separately enforcing boundary conditions), this change silently drops one multiplication step and changes p_z.

Please double-check the intended index range and coefficient count here against the protocol and existing tests/vectors, and adjust the loop bounds accordingly.

Also appears in: crates/provers/plonk/src/prover.rs:348-355 (numerator/denominator collection uses same n_minus_1).

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/provers/plonk/src/prover.rs
Line: 343:371

Comment:
**Permutation polynomial off-by-one**

`round_2` now computes `n_minus_1 = cpi.n - 1` and only generates `coefficients` of length `cpi.n` by pushing `1` plus `n-1` terms. The original loop was `for i in 0..&cpi.n - 1`, which produced exactly `n` coefficients only if the range was interpreted as `0..(n)` (due to `&cpi.n - 1` precedence). With the new explicit `n-1`, `coefficients` becomes length `n`, but this assumes the protocol wants exactly `n` Lagrange evaluations and that `z[n-1]` is the last term. If the intended range was actually `0..(n)` (common in PLONK for building z over the whole domain and then separately enforcing boundary conditions), this change silently drops one multiplication step and changes `p_z`.

Please double-check the intended index range and coefficient count here against the protocol and existing tests/vectors, and adjust the loop bounds accordingly.

Also appears in: `crates/provers/plonk/src/prover.rs:348-355` (numerator/denominator collection uses same `n_minus_1`).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 442 to 452
// Optimization: p_x = X (identity polynomial), so p_x(offset * ω^i) = offset * ω^i.
// Generate the coset directly instead of using FFT.
let omega = F::get_primitive_root_of_unity(degree.trailing_zeros() as u64)
.expect("primitive root exists for degree");
let p_x_eval: Vec<_> = (0..degree)
.scan(offset.clone(), |current, _| {
let val = current.clone();
*current = &*current * &omega;
Some(val)
})
.collect();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coset omega mismatch

p_x_eval is now generated using omega = F::get_primitive_root_of_unity(degree.trailing_zeros() as u64), and then iterating offset * omega^i. This must match the exact ordering/roots used by Polynomial::evaluate_offset_fft(..., Some(degree), offset) for the other polynomials in this round. If evaluate_offset_fft uses a different primitive root selection (or a different omega direction / bit-reversal ordering), p_x_eval will be on a different evaluation set/order than p_a_eval, p_b_eval, etc., breaking the quotient construction.

To make this safe, derive the root/ordering from the same helper used by evaluate_offset_fft (or replace this with a call that produces the same points deterministically), rather than re-deriving omega locally.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/provers/plonk/src/prover.rs
Line: 442:452

Comment:
**Coset omega mismatch**

`p_x_eval` is now generated using `omega = F::get_primitive_root_of_unity(degree.trailing_zeros() as u64)`, and then iterating `offset * omega^i`. This must match the exact ordering/roots used by `Polynomial::evaluate_offset_fft(..., Some(degree), offset)` for the other polynomials in this round. If `evaluate_offset_fft` uses a different primitive root selection (or a different `omega` direction / bit-reversal ordering), `p_x_eval` will be on a different evaluation set/order than `p_a_eval`, `p_b_eval`, etc., breaking the quotient construction.

To make this safe, derive the root/ordering from the same helper used by `evaluate_offset_fft` (or replace this with a call that produces the same points deterministically), rather than re-deriving `omega` locally.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 526 to 550
// Optimization: Z_H(x) = x^n - 1 has only 4 distinct values on a coset of size 4n.
// On coset {offset * ω^i : i = 0..4n-1} where ω is primitive 4n-th root:
// Z_H(offset * ω^i) = offset^n * (ω^n)^i - 1
// Since ω^n is a 4th root of unity, (ω^n)^i cycles through 4 values.
let omega_4n = F::get_primitive_root_of_unity(degree.trailing_zeros() as u64)
.expect("primitive root exists for degree");
let omega_n = omega_4n.pow(cpi.n as u64); // ω^n where ω is 4n-th root; this is a 4th root of unity
let offset_to_n = offset.pow(cpi.n as u64);

// Compute the 4 distinct Z_H values and their inverses
// Use multiplication chain for small powers (faster than pow)
let omega_n_sq = &omega_n * &omega_n;
let omega_n_cubed = &omega_n_sq * &omega_n;
let mut zh_base = [
&offset_to_n - FieldElement::<F>::one(), // i ≡ 0 (mod 4)
&offset_to_n * &omega_n - FieldElement::<F>::one(), // i ≡ 1 (mod 4)
&offset_to_n * &omega_n_sq - FieldElement::<F>::one(), // i ≡ 2 (mod 4)
&offset_to_n * &omega_n_cubed - FieldElement::<F>::one(), // i ≡ 3 (mod 4)
];
FieldElement::inplace_batch_inverse(&mut zh_base)
.expect("Z_H evaluations are non-zero on coset offset from roots of unity");

// Build full evaluation vector by cycling through the 4 values
let zh_eval: Vec<_> = (0..degree).map(|i| zh_base[i % 4].clone()).collect();
let c: Vec<_> = p_eval
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Z_H shortcut assumes degree=4n

The new Z_H evaluation shortcut relies on degree = 4 * cpi.n so that omega_4n is a primitive 4n-th root and omega_n = omega_4n.pow(n) is a 4th root of unity. This is only valid under that exact relationship and when degree is exactly 4n in the FFT calls. If degree changes (the code even has a TODO about “factor of 4”), or if evaluate_offset_fft internally uses a different root/order, then zh_eval will be wrong and the quotient t will be computed incorrectly.

At minimum, add an assertion that degree == 4 * cpi.n here (and/or compute omega_n from the actual evaluation root used by the FFT helper), so the optimization can’t silently produce invalid proofs when degree is adjusted.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/provers/plonk/src/prover.rs
Line: 526:550

Comment:
**Z_H shortcut assumes degree=4n**

The new `Z_H` evaluation shortcut relies on `degree = 4 * cpi.n` so that `omega_4n` is a primitive `4n`-th root and `omega_n = omega_4n.pow(n)` is a 4th root of unity. This is only valid under that exact relationship and when `degree` is exactly `4n` in the FFT calls. If `degree` changes (the code even has a TODO about “factor of 4”), or if `evaluate_offset_fft` internally uses a different root/order, then `zh_eval` will be wrong and the quotient `t` will be computed incorrectly.

At minimum, add an assertion that `degree == 4 * cpi.n` here (and/or compute `omega_n` from the actual evaluation root used by the FFT helper), so the optimization can’t silently produce invalid proofs when `degree` is adjusted.

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review by ChatGPT

for constraint in &self.constraints {
let a = assignments.get(&constraint.l);
let b = assignments.get(&constraint.r);
let c = assignments.get(&constraint.o);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness

  • The changes involve iteration over constraints with assignments. Ensure that solve_hint and solve_constraint are correctly transforming the assignments, maintaining invariant properties for each constraint.
  • Consider edge cases where assignments may initially be empty or when constraints have zero or identity elements which might produce unexpected results.

Security

  • Verify if solve_hint and solve_constraint operate in constant-time, especially if handling secret data to prevent timing side-channel attacks.
  • Check if sensitive data within assignments requires zeroization after processing.

Performance

  • Iterating over all constraints for each assignment might be inefficient depending on the size of self.constraints. Consider breaking out early if possible on each iteration after an assignment changes.

Bugs & Errors

  • Confirm that Variable identifiers are correctly handled and that any possible conversion or retrieval of hash map values does not panic (e.g., due to missing constraints or invalid keys).

Code Simplicity

  • Current loop structure with assignments = solve_hint(assignments, constraint); and assignments = solve_constraint(assignments, constraint); may introduce redundancy. If possible, merging these processes could simplify the implementation, reducing cognitive load.

Overall Comment

Ensure thorough testing for edge cases and data-related issues. Verify if a more efficient constraint solving methodology can be implemented to enhance performance without compromising correctness and security.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Kimi AI Review

Review Feedback

File: crates/provers/plonk/src/constraint_system/mod.rs

Line 193-200:

  • Simplification: The code has been improved by using filter_map and collect to reduce the number of lines and eliminate the need for an intermediate public_inputs vector. This is a good optimization that makes the code more concise and efficient.

File: crates/provers/plonk/src/constraint_system/solver.rs

Line 19-28:

  • Code Clarity: The use of &self.constraints instead of self.constraints.iter() is more idiomatic Rust and avoids unnecessary cloning of the iterator. This change is good for both performance and code readability.

Line 39-48:

  • Code Clarity: Similar to the previous change, using &self.constraints improves code clarity and performance.

File: crates/provers/plonk/src/prover.rs

Line 334-372:

  • Performance: The changes in this section significantly improve the performance of the permutation polynomial computation by:
    • Computing all numerators and denominators first, which avoids repeated calculations.
    • Batch inverting the denominators, which is more efficient than individual inversions.
    • Using a more efficient approach to compute the coefficients.
      These changes are excellent for performance optimization.

Line 386-438:

  • Performance: The optimizations here are also significant:
    • Direct computation of p_x_eval using a coset approach instead of FFT, which is more efficient.
    • Precomputation of zh_eval values and their inverses, avoiding repeated FFT evaluations.
      These changes are beneficial for performance.

Line 583-651:

  • Performance: The precomputation of zeta powers and the efficient computation of l1_zeta and alpha_squared are good optimizations. They reduce the computational complexity and improve performance.

File: crates/provers/plonk/src/test_utils/utils.rs

Line 34-59:

  • Performance: The iterative multiplication approach for generating powers of g1 and omega is more efficient than using pow(). This change is good for performance.

Line 72-81:

  • Performance: The precomputation of w powers using iterative multiplication is more efficient than repeated multiplication. This change is beneficial for performance.

File: crates/provers/plonk/src/verifier.rs

Line 326-411:

  • Performance: The precomputation of zeta powers and the efficient computation of l1_zeta and alpha_squared are good optimizations. They reduce the computational complexity and improve performance.

Conclusion

The changes in this PR are focused on performance optimizations and code clarity. The use of more efficient algorithms, batch operations, and precomputation of values significantly improve the performance of the library. The code changes are well-justified and align with best practices in Rust programming. Overall, the PR is well-implemented and should be merged.

---
**Commit ID:** `fab28e72f`
**PR Title:** [Optimize performance and improve code clarity in PLONK]
**Author:** @lambdaworks
**Reviewer:** @rust-crypto

**Feedback:**
- **Performance:** Significant optimizations have been made, particularly in the areas of polynomial evaluation and batch operations. The changes are well-justified and improve the performance of the library.
- **Code Clarity:** The use of more idiomatic Rust constructs and the reduction of unnecessary intermediate variables have improved the readability and maintainability of the code.

**Conclusion:**
The PR is well-implemented and should be merged. The changes are focused on performance optimizations and code clarity, which are both important for the library's usability and maintainability.

**Action:**
Approve and merge the PR.

**Additional Notes:**
N/A

diegokingston added a commit that referenced this pull request Feb 5, 2026
Address reviewer concerns from PR #1143:
- Add debug_assert verifying p_x_eval length matches FFT evaluation length
- Add debug_assert enforcing degree == 4n assumption for Z_H optimization
- Add clarifying comment explaining permutation polynomial index range is correct
- Add safety comments documenting the relationship between manual coset generation
  and FFT's primitive root derivation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review by ChatGPT

// Compute coefficients using the inverted denominators
let mut coefficients: Vec<FieldElement<F>> = Vec::with_capacity(cpi.n);
coefficients.push(FieldElement::one());

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness

  • Primitive Root Assumptions: You correctly assert the existence of a primitive root using expect, but ensure that the assumptions about the degree calculations hold under all cases, especially related to trailing zeros and field limits.
  • Edge Cases: Ensure that assumptions about degree == 4 * n are valid for all input scenarios beyond just the constraints in the comments. Changes in input domain characteristics could lead to edge cases being improperly handled.

Security

  • Timing Side-Channels: Code involving field elements often needs to be constant time. Ensure that additions using + and multiplications with * on field elements are constant time to avoid leaks.
  • SAFETY Assertions: Assertions are used for optimization checks, but consider using them in a way that does not leak information in a release build, or ensure they compile out entirely, depending how critical they could be as attack vectors.

Performance

  • Unnecessary Allocations: The direct use of Vec::with_capacity(n_minus_1) can be justified if filled without extraneous allocations. After ensuring correctness, look for other parts of the code that might perform unnecessary allocations tightly connected with mathematical operations.

Bugs & Errors

  • Panics and Debug Assertions: Although debug assertions are useful, they should not reach a production code path that could panic under an invalid constraint. Consider a strategy to gracefully handle these conditions.
  • Potential Panics with expect: While you use expect in multiple places, ensure that the message clearly describes the failure context and, if possible, check those conditions upfront to avoid panics.

Code Simplicity

  • Complex Assertions: Perhaps consider encapsulating complex checks like the degree match into helper functions, potentially improving code readability and reuse.

Overall, assumptions in mathematical operations need further scrutiny to ensure that the constraints and debug assertions align with real-world and adversarial inputs. Consider potential refactorings or performance enhancements after addressing foundational issues.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Kimi AI Review

Here are my specific comments on the provided PR diff:

  1. crates/provers/plonk/src/constraint_system/mod.rs

    • The change from lines 193-198 to lines 193-196 looks good. The use of filter_map and collect is more idiomatic Rust and avoids unnecessary cloning of public_inputs. This change is correct and improves performance by reducing allocations.
  2. crates/provers/plonk/src/constraint_system/solver.rs

    • The change from lines 19-20 to lines 19-20 is correct. Using assignments.len() instead of assignments.keys().len() is more efficient as it avoids an extra call to keys().
  3. crates/provers/plonk/src/prover.rs

    • The changes from lines 334-336 to lines 334-341 look good. Computing all numerators and denominators first before inverting the denominators is a good optimization. Batch inverting the denominators is also more efficient than individual inversions.

    • The changes from lines 386-388 to lines 399-408 are correct. The optimization to directly generate the coset for p_x_eval instead of using FFT is valid since p_x is the identity polynomial. This change improves performance.

    • The changes from lines 430-432 to lines 440-450 are good. The optimization to directly compute zh_eval for Z_H(x) = x^n - 1 is valid given the constraints on degree. This change improves performance.

    • The changes from lines 505-506 to lines 532-533 are correct. Computing zeta_raised_n and zeta_raised_2n more efficiently by avoiding redundant exponentiations is a good optimization.

    • The change from lines 583-584 to lines 637-638 is correct. Using alpha_squared instead of computing r3.alpha * r3.alpha repeatedly is a good optimization.

  4. crates/provers/plonk/src/test_utils/utils.rs

    • The changes from lines 34-38 to lines 34-42 look good. Using iterative multiplication instead of pow() for generating powers is more efficient.

    • The changes from lines 59-61 to lines 67-81 are correct. The optimization to generate the domain more efficiently using iterative multiplication is valid.

    • The changes from lines 72-76 to lines 81-90 are good. The optimization to generate the permutation coefficients more efficiently using precomputed powers is valid.

  5. crates/provers/plonk/src/verifier.rs

    • The changes from lines 326-328 to lines 326-333 are correct. Computing zh_zeta and l1_zeta more efficiently by avoiding redundant exponentiations is a good optimization.

    • The changes from lines 362-364 to lines 371-372 are good. Using alpha_squared instead of computing alpha * alpha repeatedly is a good optimization.

    • The changes from lines 372-376 to lines 381-382 are correct. Computing partial_t_1 more efficiently by avoiding redundant exponentiations is a good optimization.

Overall, these changes look good. They improve performance by reducing unnecessary allocations, using more efficient algorithms, and avoiding redundant computations. The mathematical correctness and cryptographic security aspects seem unchanged. The code is also simplified and more maintainable.

@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 99.37500% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.33%. Comparing base (06d07ba) to head (27b6e23).

Files with missing lines Patch % Lines
...ates/provers/plonk/src/test_utils/circuit_large.rs 98.80% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1143      +/-   ##
==========================================
+ Coverage   73.18%   73.33%   +0.14%     
==========================================
  Files         176      177       +1     
  Lines       39491    39711     +220     
==========================================
+ Hits        28903    29123     +220     
  Misses      10588    10588              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Round 2: Replace n-1 individual divisions with batch inversion
  (reduces field inversions from O(n) to O(1) via Montgomery's trick)

- Round 3: Skip FFT for vanishing polynomial Z_H = x^n - 1
  (Z_H has only 4 distinct values on a 4n coset; compute directly)

- Round 3: Skip FFT for identity polynomial p_x = X
  (generate coset values directly as offset * ω^i)

- Use multiplication chains instead of pow() for small exponents
Prover Round 5:
- Compute zeta^n once, derive zeta^(n+2) and zeta^(2n+4) via multiplications
- Precompute alpha^2 instead of computing it inline

Verifier:
- Same zeta power optimization (eliminates 2 redundant exponentiations)
- Precompute alpha^2 for reuse
- Reuse zh_zeta (which equals zeta^n - 1) for l1_zeta computation
- Simplify partial_t_1 computation with precomputed powers
- Replace pow() in test_srs and identity_permutation with iterative multiplication
- Use scan instead of fold with unwrap in generate_domain
- Simplify generate_permutation_coefficients using iterator
- Simplify public_input_values using filter_map
- Fix redundant alpha*alpha computation in verifier (use precomputed alpha_squared)
- Use HashMap::len() instead of HashMap::keys().len()
- Use reference iteration (&self.constraints) instead of .iter()
Address reviewer concerns from PR #1143:
- Add debug_assert verifying p_x_eval length matches FFT evaluation length
- Add debug_assert enforcing degree == 4n assumption for Z_H optimization
- Add clarifying comment explaining permutation polynomial index range is correct
- Add safety comments documenting the relationship between manual coset generation
  and FFT's primitive root derivation
@diegokingston diegokingston force-pushed the perf/plonk-optimizations branch from 5d88a65 to 5d17814 Compare February 5, 2026 21:42
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review by ChatGPT

let p_pi_eval = Polynomial::evaluate_offset_fft(&p_pi, 1, Some(degree), offset)
.expect("FFT evaluation of p_pi must be within field's two-adicity limit");
let p_x_eval = Polynomial::evaluate_offset_fft(p_x, 1, Some(degree), offset)
.expect("FFT evaluation of p_x must be within field's two-adicity limit");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness

  • The code makes several assumptions about the existence of primitive roots which are verified using expect. While the debug assertions help ensure these assumptions hold, they can lead to panics in release builds if the conditions change. Consider replacing these with proper error handling or ensuring that the assumptions are always validated.
  • The safety comment regarding the assumption degree == 4 * n should be clearly documented and verified elsewhere to ensure this invariant is maintained across future code changes.

Security

  • There are no measures mentioned regarding cryptographically secure randomness, which may be important depending on where and how this code is used.
  • There is no explicit mention of constant-time operations or handling of secret values without timing side-channels. Ensure that sensitive operations do not introduce timing attacks.
  • The provided code does not indicate handling of zeroization of sensitive data.

Performance

  • The use of debug assertions is a good practice, but also consider adding release-build checks or optimizations that do not rely solely on debug-build diagnostics.
  • Ensure that unnecessary allocations are minimized especially in high-demand functions such as FFT evaluations.

Bugs & Errors

  • The use of expect statements can lead to panics which should be avoided especially in a cryptographic library.
  • Relying on panic behavior is risky; consider using a more robust error handling mechanism.

Code Simplicity

  • The explanations within the comments are informative and help in understanding the logic of the implementation. Be mindful that these assumptions and optimizations are correctly documented in user-facing documentation as well.

Consider addressing the above issues before merging to ensure robustness and security of the cryptographic operations.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Kimi AI Review

Here are my specific feedback and comments on the provided PR diff:

  1. File: crates/provers/plonk/src/constraint_system/mod.rs

    • Lines 193-200: The change from a manual loop to filter_map and collect is a good optimization for collecting public inputs. This reduces the number of allocations and makes the code more concise. Looks good!
  2. File: crates/provers/plonk/src/constraint_system/solver.rs

    • Lines 19-27: Changing keys().len() to len() is more idiomatic Rust and avoids unnecessary method calls. This is a good change.
    • Lines 33-40: Similar change in the loop condition from keys().len() to len(). Good optimization.
  3. File: crates/provers/plonk/src/prover.rs

    • Lines 334-371: The optimization for computing permutation polynomial coefficients is well-implemented. It reduces the number of divisions by batching the inversions and pre-computing factors. This is a significant performance improvement.
    • Lines 386-439: The optimization for evaluating polynomials at specific points using direct computation instead of FFT is clever. It takes advantage of the structure of the polynomials and the properties of the field elements. This is a good optimization.
    • Lines 505-531: The changes to compute powers of zeta efficiently are correct. It avoids redundant calculations and makes the code more concise.
  4. File: crates/provers/plonk/src/test_utils/utils.rs

    • Lines 34-54: The changes to generate powers of g1 and omega iteratively are more efficient than using pow() in a map. This is a good optimization.
    • Lines 59-81: The changes to generate the permutation coefficients and the identity permutation are correct and make the code more efficient by avoiding unnecessary computations.
  5. File: crates/provers/plonk/src/verifier.rs

    • Lines 326-411: The optimizations for computing powers of zeta and using them in the verification process are well-done. It reduces the number of exponentiations and makes the code more concise.

Overall, the changes in this PR are focused on performance optimizations and code simplification. The optimizations are well-implemented and should significantly improve the performance of the PLONK prover and verifier. The code changes are also more idiomatic Rust, which makes the code easier to maintain and understand. Good job on the refactor!

One minor suggestion: Consider adding comments to explain the optimizations, especially for the more complex ones involving polynomial evaluations and field element computations. This will help other developers understand the rationale behind the changes.

@diegokingston
Copy link
Collaborator Author

@greptile

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Kimi AI Review

Here are my specific comments on the PR diff:

  1. mod.rs:193-207

    • The change from a manual loop to filter_map and collect is a good optimization. It reduces the number of allocations and improves code conciseness.
    • However, consider adding a comment explaining why this change is safe, i.e., why values.get(key).cloned() is equivalent to the original loop.
  2. solver.rs:19-40

    • The change from keys() to len() is good for avoiding an extra iteration over the keys.
    • The loop logic seems correct. However, consider adding a comment explaining why the loop terminates when old_solved == assignments.len().
  3. prover.rs:334-372

    • The optimization to precompute numerators and denominators before computing coefficients is good for performance.
    • The batch inversion of denominators is a nice optimization. However, consider adding a comment explaining why this is safe, i.e., why we can assume denominators are non-zero.
    • The optimization for p_x_eval is clever. However, the TODO comment about the degree formula should be addressed. Consider adding a more detailed comment explaining the assumption and what would need to change if the degree formula changes.
  4. prover.rs:583-636

    • The optimizations to precompute zeta powers and alpha squared are good for performance.
    • The TODO comment about the paper discrepancy should be addressed. Consider adding a more detailed comment explaining the discrepancy and why the code uses the current values.
  5. utils.rs:34-56

    • The optimizations to use iterative multiplication instead of pow() are good for performance.
    • The code seems correct, but consider adding comments explaining the optimizations and why they are safe.
  6. verifier.rs:326-411

    • The optimizations to precompute zeta powers and alpha squared are good for performance.
    • The code seems correct, but consider adding comments explaining the optimizations and why they are safe.

Overall, the changes seem to improve performance and code conciseness while maintaining correctness. However, some TODO comments should be addressed with more detailed explanations. Adding comments to explain the optimizations and assumptions would also improve code readability and maintainability.

@greptile-apps
Copy link

greptile-apps bot commented Feb 6, 2026

Greptile Overview

Greptile Summary

  • Optimizes PLONK prover rounds by batching inversions in the permutation product and avoiding FFTs for p_x = X and Z_H(x)=x^n-1 evaluations on the 4n coset.
  • Replaces some small exponent pow() calls with multiplication chains; verifier mirrors this with precomputed ζ-powers/α².
  • Refactors small iterator patterns in constraint system and solver; updates test utility generators to use iterative multiplication.
  • Main correctness sensitivity is that the new direct Z_H evaluation path depends on degree == 4n and must be enforced in release builds to avoid silently producing invalid proofs if the degree logic changes.

Confidence Score: 4/5

  • Generally safe to merge once the release-mode correctness guard is fixed.
  • Core changes are performance-focused and mostly preserve existing semantics; however, Z_H direct evaluation currently relies on a debug-only assertion, which can silently break proofs in release if degree changes. Test util change also alters the failure mode for invalid permutation slices and should be made explicit.
  • crates/provers/plonk/src/prover.rs (Z_H direct eval guard), crates/provers/plonk/src/test_utils/utils.rs (permutation slice contract)

Important Files Changed

Filename Overview
crates/provers/plonk/src/constraint_system/mod.rs Minor refactor: public input values now collected via iterator/filter_map; no behavior change expected.
crates/provers/plonk/src/constraint_system/solver.rs Minor loop refactor using assignments.len() and iterating by reference; no functional change.
crates/provers/plonk/src/prover.rs Adds batch inversion in round 2, direct coset evaluations for p_x and Z_H, and small exponent mult chains; found a release-mode safety issue (debug_assert on degree==4n).
crates/provers/plonk/src/test_utils/utils.rs Optimizes test SRS/domain/permutation generation with iterative multiplication; found a behavioral change where permutation slice is silently truncated, masking length requirements and risking panics depending on contents.
crates/provers/plonk/src/verifier.rs Precomputes zeta powers and alpha^2 to avoid repeated exponentiation; no functional change detected.

Sequence Diagram

sequenceDiagram
  autonumber
  participant Prover
  participant Poly as Polynomial/FFT
  participant Field as FieldElement
  participant CS as CommitmentScheme
  participant Verifier

  Prover->>Prover: Round 2 - build permutation factors
  Prover->>Field: collect denominators[0..n-2]
  Prover->>Field: inplace_batch_inverse(denominators)
  Prover->>Prover: accumulate z coefficients using num[i]*den_inv[i]

  Prover->>Poly: Round 3 - evaluate witness polys on coset via evaluate_offset_fft
  Prover->>Field: generate p_x_eval directly: offset * ω^i
  Prover->>Field: compute zh_eval directly (4-cycle) and batch invert 4 values
  Prover->>Poly: interpolate_offset_fft(c = p_eval * zh_eval)
  Prover->>CS: commit to quotient/linearization polys

  Verifier->>Verifier: compute challenges (β, γ, α, ζ, υ)
  Verifier->>Field: precompute ζ^n, ζ^(n+2), ζ^(2n+4), α^2
  Verifier->>Verifier: check constraints: p(ζ) == Z_H(ζ) * t(ζ)
  Verifier->>CS: verify_batch openings at ζ
  Verifier->>CS: verify single opening at ζω
  Verifier-->>Verifier: accept/reject
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +541 to 545
debug_assert_eq!(
degree,
4 * cpi.n,
"Z_H optimization requires degree == 4n; if degree formula changes, update this code"
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug-only assumption check

degree == 4 * cpi.n is only enforced with debug_assert_eq!, so in release builds this optimization can silently produce incorrect Z_H evaluations if degree ever changes (the surrounding code even has a TODO about the factor of 4). This should be a real assert_eq! (or a checked branch that falls back to the FFT path) so invalid proofs can’t be produced in optimized builds.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/provers/plonk/src/prover.rs
Line: 541:545

Comment:
**Debug-only assumption check**

`degree == 4 * cpi.n` is only enforced with `debug_assert_eq!`, so in release builds this optimization can silently produce incorrect `Z_H` evaluations if `degree` ever changes (the surrounding code even has a TODO about the factor of 4). This should be a real `assert_eq!` (or a checked branch that falls back to the FFT path) so invalid proofs can’t be produced in optimized builds.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link

greptile-apps bot commented Feb 6, 2026

Additional Comments (1)

crates/provers/plonk/src/test_utils/utils.rs
Out-of-bounds on short permutation

generate_permutation_coefficients now does permutation.iter().take(n * 3).map(|&i| identity[i].clone()). If permutation.len() < 3n, the .take(3n) still iterates permutation.len() elements, but any element value i outside 0..identity.len() will panic with an out-of-bounds index. The previous code indexed permutation[i] for i in 0..3n, which would also panic when permutation.len() < 3n, but it guaranteed the input length requirement; the new code makes it easier to accidentally pass a shorter slice and get a panic that depends on the slice contents.

If the function requires permutation.len() >= 3n and all entries < 3n, it should enforce that (e.g., assert_eq!(permutation.len(), 3*n) / bounds-check) rather than silently truncating.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/provers/plonk/src/test_utils/utils.rs
Line: 63:75

Comment:
**Out-of-bounds on short permutation**

`generate_permutation_coefficients` now does `permutation.iter().take(n * 3).map(|&i| identity[i].clone())`. If `permutation.len() < 3n`, the `.take(3n)` still iterates `permutation.len()` elements, but any element value `i` outside `0..identity.len()` will panic with an out-of-bounds index. The previous code indexed `permutation[i]` for `i in 0..3n`, which would also panic when `permutation.len() < 3n`, but it *guaranteed* the input length requirement; the new code makes it easier to accidentally pass a shorter slice and get a panic that depends on the slice contents.

If the function requires `permutation.len() >= 3n` and all entries `< 3n`, it should enforce that (e.g., `assert_eq!(permutation.len(), 3*n)` / bounds-check) rather than silently truncating.

How can I resolve this? If you propose a fix, please make it concise.

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Kimi AI Review

Review Feedback

constraint_system/mod.rs

File: crates/provers/plonk/src/constraint_system/mod.rs

  1. Line 193-200: The change from using a Vec to collect public inputs to using filter_map and collect is an improvement for performance and code simplicity. This change avoids unnecessary cloning of FieldElement when constructing the public_inputs vector. It also makes the code more concise and idiomatic Rust. Approved.

constraint_system/solver.rs

File: crates/provers/plonk/src/constraint_system/solver.rs

  1. Line 19-27: The change from using keys().len() to len() is a minor improvement for readability and conciseness. It directly uses the HashMap's len method, which is more straightforward than accessing the keys and then getting the length. Approved.

prover.rs

File: crates/provers/plonk/src/prover.rs

  1. Line 334-372: The optimization in computing permutation polynomial coefficients is significant. It avoids redundant division operations and uses batch inversion for efficiency. This change is correct and improves performance. Approved.
  2. Line 386-439: The optimization for evaluating polynomials at specific points is well-implemented. It correctly uses the properties of roots of unity and the structure of the polynomial to avoid unnecessary computations. The code is mathematically correct and improves performance. Approved.
  3. Line 505-531: The computation of zeta powers and the subsequent calculations are optimized for performance. The use of pow is replaced with iterative multiplication, which is more efficient. The code is correct and improves performance. Approved.

test_utils/utils.rs

File: crates/provers/plonk/src/test_utils/utils.rs

  1. Line 34-51: The changes to generate powers of a group element and a domain are optimized for performance. Using iterative multiplication instead of repeated pow calls is more efficient. The code is correct and improves performance. Approved.

verifier.rs

File: crates/provers/plonk/src/verifier.rs

  1. Line 326-381: The optimizations for computing powers of zeta and other related calculations are correct and improve performance. The use of iterative multiplication and batch operations is more efficient than repeated pow calls. Approved.

Summary

The changes in this PR are focused on improving performance and code simplicity. The optimizations are mathematically correct and adhere to best practices in Rust programming. The changes are approved as they enhance the library's efficiency without compromising correctness or security.

@nicole-graus
Copy link
Contributor

Benchmark Results

Environment 1:

Apple M1 8 cores (8 threads - no hyperthreading)

Circuit Size (n) Baseline (main) Optimized (PR) Change
4 5.65 ms 5.59 ms +0.58%
8 9.00 ms 9.07 ms +0.43%
16 13.39 ms 13.31 ms -0.78%
32 21.25 ms 21.55 ms +1.47%
64 34.16 ms 34.07 ms +0.06%
128 60.51 ms 58.82 ms -1.85%
256 95.17 ms 115.55 ms +11.73%
512 171.21 ms 172.19 ms +6.43%
4096 961.11 ms 936.89 ms -2.52%

Environment 2:

AMD Ryzen 5 PRO 3600 (6-Core, 12 threads)

Circuit Size (n) Main (ms) Optimized (ms) Change
4 8.68 7.79 -10.3%
8 13.63 12.34 -9.4%
16 19.74 18.37 -6.9%
32 31.33 28.91 -7.7%
64 49.65 46.36 -6.6%
128 85.26 79.80 -6.4%
256 135.83 128.60 -5.3%
512 240.97 232.78 -3.4%
4096 1314.7 1478.2 +12.4%

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Kimi AI Review

Review Feedback

File: crates/provers/plonk/src/constraint_system/mod.rs

Line 193-200:

  • Feedback: The change from a manual loop to filter_map and collect is an improvement for conciseness and readability. This is a good optimization as it reduces the number of lines and avoids manual cloning within the loop. No issues found.

File: crates/provers/plonk/src/constraint_system/solver.rs

Line 19-27:

  • Feedback: The change from .keys().len() to .len() is correct and more idiomatic Rust. This is a minor improvement but helps in readability and maintainability. No issues found.

Line 39-47:

  • Feedback: The loop and conditional checks are correctly updated to use .len(). This change is consistent with the previous one and maintains the logic's integrity. No issues found.

File: crates/provers/plonk/src/prover.rs

Line 334-373:

  • Feedback: The optimization of computing coefficients by pre-computing numerators and denominators, and then inverting the denominators in batch, is a significant improvement. This reduces the number of field inversions from n-1 to just one, which can greatly improve performance. The code is also more readable with this change. No issues found.

Line 386-439:

  • Feedback: The optimization for p_x_eval by directly generating the coset values instead of using FFT is excellent. This avoids unnecessary computations and leverages the properties of the field elements. The debug assertion to ensure the lengths match is a good practice for maintaining correctness. No issues found.

Line 505-531:

  • Feedback: The changes to compute zeta powers more efficiently are good. Using iterative multiplication to compute powers is faster than repeated squaring, especially for small exponents. The code is also more readable with these changes. No issues found.

Line 583-636:

  • Feedback: The precomputation of zeta powers and alpha squared is a good optimization. This reduces the number of field multiplications required during the computation, which can improve performance. The code is also more readable with these changes. No issues found.

File: crates/provers/plonk/src/test_utils/utils.rs

Line 34-54:

  • Feedback: The change from using pow() to iterative multiplication for generating powers is a good optimization. This avoids potential stack overflows with large exponents and is more efficient. The code is also more readable with these changes. No issues found.

Line 59-81:

  • Feedback: The changes to generate the domain and permutation coefficients are correct and improve performance by using iterative multiplication. These changes are consistent with the previous optimizations and maintain the logic's integrity. No issues found.

File: crates/provers/plonk/src/verifier.rs

Line 326-381:

  • Feedback: The optimizations for computing zeta powers and alpha squared are consistent with the changes made in the prover module. These optimizations improve performance and maintain the logic's integrity. No issues found.

Conclusion

The changes in this PR are well-thought-out and improve both performance and code readability. The optimizations for field operations, especially the use of iterative multiplication and batch inversions, are significant and correctly implemented. The code is also more concise and readable with these changes. No issues were found during the review.

@github-actions
Copy link

Kimi AI Review

Review Feedback

File: crates/provers/plonk/src/constraint_system/mod.rs

Line 193-201:

  • Issue: The original code snippet clones values from the HashMap. The updated code uses filter_map and cloned to achieve the same result, which is more idiomatic Rust.
  • Suggestion: The updated code is correct and more efficient. No changes needed.

File: crates/provers/plonk/src/constraint_system/solver.rs

Line 19-31:

  • Issue: The original code snippet uses keys().len() to check for solved constraints, which is less efficient due to the extra iterator creation.
  • Suggestion: The updated code uses len() directly on the HashMap, which is more efficient and correct. No changes needed.

File: crates/provers/plonk/src/prover.rs

Line 334-371:

  • Issue: The original code snippet performs individual divisions and inversions, which is less efficient.
  • Suggestion: The updated code batches the inversions, which is more efficient. Additionally, it precomputes the numerators and denominators separately before combining them, which is a good optimization. No changes needed.

Line 386-439:

  • Issue: The original code snippet uses individual FFT evaluations for polynomials, which is less efficient.
  • Suggestion: The updated code uses more efficient methods for evaluating polynomials, especially for the identity polynomial p_x. This is a good optimization. No changes needed.

Line 505-531:

  • Issue: The original code snippet computes powers of zeta using individual exponentiations, which is less efficient.
  • Suggestion: The updated code computes powers of zeta more efficiently by reusing previous results. This is a good optimization. No changes needed.

File: crates/provers/plonk/src/test_utils/utils.rs

Line 34-59:

  • Issue: The original code snippet uses pow() for exponentiation, which is less efficient.
  • Suggestion: The updated code uses iterative multiplication for exponentiation, which is more efficient. This is a good optimization. No changes needed.

Line 72-81:

  • Issue: The original code snippet computes powers of w using individual multiplications, which is less efficient.
  • Suggestion: The updated code uses iterative multiplication for computing powers of w, which is more efficient. This is a good optimization. No changes needed.

File: crates/provers/plonk/src/verifier.rs

Line 326-381:

  • Issue: The original code snippet computes powers of zeta using individual exponentiations, which is less efficient.
  • Suggestion: The updated code computes powers of zeta more efficiently by reusing previous results. This is a good optimization. No changes needed.

Overall, the changes in the PR are focused on improving performance through more efficient computation of powers, batch inversions, and iterative multiplication. These optimizations are correct and improve the performance of the code without compromising correctness.

@github-actions
Copy link

Kimi AI Review

Here are my reviews of the provided PR diff:

  1. crates/provers/plonk/Cargo.toml:

    • The addition of the rayon dependency with an optional feature parallel is appropriate for enabling parallel computation. This can improve performance on multi-core systems.
    • The addition of benchmarking dependencies and configuration is good for performance testing.
  2. crates/provers/plonk/benches/criterion_prover.rs:

    • The new benchmark file is well-structured and tests the PLONK prover with varying circuit sizes. This will help identify performance bottlenecks.
    • It's good to see the use of black_box to prevent compiler optimizations from affecting the benchmark results.
  3. crates/provers/plonk/src/constraint_system/mod.rs:

    • The change from cloning values to using filter_map and collect is a good optimization to reduce allocations. This makes the code more efficient.
    • Line 193: This change is correct and improves performance.
  4. crates/provers/plonk/src/constraint_system/solver.rs:

    • Lines 19-34: The changes to use references instead of owned values in the loop are good for reducing unnecessary cloning and improving performance.
  5. crates/provers/plonk/src/prover.rs:

    • The addition of parallel computation using Rayon is well-implemented. The use of rayon::join allows for concurrent execution of independent tasks.
    • Lines 312-315: The changes to use Sync bounds are necessary for safe parallel computation.
    • Lines 386-418: The optimization of polynomial evaluations using batch operations is excellent. This reduces the number of individual operations and improves performance.
    • Lines 505-533: The changes to compute commitments in parallel are correct and improve performance.
    • Lines 583-611: The changes to compute zeta powers efficiently are good optimizations. This avoids redundant exponentiations and improves performance.
  6. crates/provers/plonk/src/test_utils/circuit_large.rs:

    • The new file provides a scalable test circuit by padding a smaller circuit with identity gates. This is useful for testing the PLONK prover with larger circuits.
    • The test functions for different circuit sizes are well-implemented and will help ensure the prover works correctly at various scales.
  7. crates/provers/plonk/src/test_utils/mod.rs:

    • The addition of the circuit_large module is good for organizing test utilities.
  8. crates/provers/plonk/src/test_utils/utils.rs:

    • The changes to use iterative multiplication instead of pow() are good optimizations. This improves performance by reducing the number of exponentiations.
    • The changes to use scan for generating domains and permutation coefficients are correct and improve performance by avoiding unnecessary cloning

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review by ChatGPT


[[bench]]
name = "criterion_prover"
harness = false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness

  • Mathematical Operations: Please ensure that all mathematical operations, especially modular arithmetic and field operations, handle edge cases correctly, like zero and identity elements.
  • Polynomial and FFT implementations: Check for any off-by-one errors or incorrect assumptions about polynomial degrees and FFT lengths.

Security

  • Constant Time Operations: Ensure that all cryptographic operations on secret data, including elliptic curve operations, are constant-time to avoid timing side-channels.
  • Randomness: Verify that the randomness sourced for cryptographic operations is cryptographically secure.
  • Hash Domain Separation: Ensure that all hash functions used in different contexts are domain-separated to prevent collision attacks across distinct domains.

Performance

  • Inversions and Allocations: Examine the code for any unnecessary field inversions or allocations, particularly in critical paths like MSM and FFT.

Bugs & Errors

  • Panic and Unwraps: Look for potential panic scenarios or the use of unwrap() that could lead to runtime crashes if assumptions are violated.
  • Integer Overflow: Ensure that all arithmetic operations are checked to prevent integer overflow/underflow.

Code Simplicity

  • Complex Implementations: Review the code for complex logic that can be broken down into simpler functions or better abstractions.
  • Duplicated Code: Ensure there is no duplicated logic across the codebase that can be abstracted into shared functions.

}

criterion_group!(benches, bench_plonk_prover);
criterion_main!(benches);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness

  • Mathematical operations: No issues are evident in the code snippet regarding modular or field arithmetic directly, as operations seem to rely on library calls which presumably handle these safely. However, ensure that FieldElement::from and multiplication (&x * &e) wrap correctly under the modulus.
  • Edge cases: It’s unclear how test_witness_size and test_common_preprocessed_input_size handle edge values like zero, which could be significant if they lead to zero-constructed polynomials or zero division during setup.

Security

  • Timing side-channels: Ensure functions such as Kzg::new and prover.prove(...) are constant-time, especially operations within them that handle private data.
  • Zeroization: No explicit zeroization is present for sensitive data. This might be outside the scope of the benchmark, but sensitive cryptographic material should be zeroized after use.
  • Randomness: The TestRandomFieldGenerator should clearly and securely initialize to prevent predictable randomness during proofs.
  • Hash functions: Verify that any hash function usage respects domain separation, even though it’s not shown here.

Performance

  • Allocations: No unnecessary allocation is evident in this snippet beyond creating vectors, which appear intentional.
  • Redundant operations: Invocations like FieldElement::from(4_u64) can be pre-computed if possible outside loops for efficiency.
  • FFT efficiency: Not directly applicable here, but ensure FFTs in setup or proofs are optimized for sizes used (e.g., power of 2 optimizations).

Bugs & Errors

  • Panics/unwraps: No explicit panics or unwraps appear in the code. Verify any underlying library functions also handle input safely without panic.
  • Memory safety: Rust's borrow checker typically ensures memory safety, but allocations with vec![x, y] imply heap usage without apparent risk.
  • Integer overflow: FieldElement::from should inherently handle overflow safely if used with field moduli constraints.
  • Off-by-one errors: Nothing immediately evident as off-by-one, though ensure looping constructs in prover.prove() accommodate index ranges correctly.

Code Simplicity

  • Complexity: The implementation seems straightforward for benchmarking.
  • Duplicated code: None observed.
  • Abstractions: Usage of the Criterion library aids abstraction, but ensure any cryptographic process like Kzg::new() isn’t overly complicated or misused. Consider if TestRandomFieldGenerator could inherit from a more general secure random generator, reducing test complexity.

Overall, this code snippet is focused on benchmarking and setup, but more scrutiny is needed on functions like setup() and prover.prove(...) that are not shown but are critical for correctness and security. The code should be more robust regarding potential side-channel attacks and sensitive data handling for production use. Ensure to thoroughly test elliptic curve operations, setups, and snarks within those unshown methods.

"p_x_eval length must match FFT evaluation length"
);

let p_constraints_eval: Vec<_> = p_a_eval

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness:

  • Line 580: Ensure that polynomial blinding and commitment scheme operations follow correct modular arithmetic over the field, especially with respect to the degree and offset.
  • FFT Evaluations: The code seems to compress multiple polynomial evaluations which are expected; ensure all polynomials used in FFT are correct and the evaluations are accurately accounted for.

Security:

  • Constant-time Operations: The use of parallel execution can introduce side channels due to non-constant-time operations. Ensure the operations involving secrets are constant-time to prevent timing attacks.

Performance:

  • Parallel Execution: The implementation attempts to parallelize FFT and commitment operations, which is good for performance, but the effectiveness needs to be measured to ensure this overhead doesn't outweigh the performance gains.

Bugs & Errors:

  • Error Handling: The expect statements used in FFT operations imply potential panics if preconditions are not met. Consider alternative error handling to prevent abrupt failures.

Code Simplicity:

  • Parallelization Complexity: While the parallel implementation can improve performance, it increases code complexity. Ensure that the added complexity results in measurable performance gains without introducing errors.

Additional Considerations:

  • Zeroization: Ensure any sensitive data like secrets in fields or during commitments are zeroized from memory post-operation to prevent leakage.
  • Hash Functions: If hash functions are used, ensure proper domain separation to avoid hash collisions or application errors. Ensure any random field generation uses cryptographically secure randomness.

Overall, the changes introduce potential performance improvements but raise concerns about security and complexity that need to be addressed before merging.

let verifier = Verifier::new(kzg);
assert!(verifier.verify(&proof, &public_input, &common, &vk));
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness

  • The function test_common_preprocessed_input_size uses unwrap on FrField::get_primitive_root_of_unity(log_n). This can cause a panic if the root does not exist, but given that all inputs are powers of two and the field specifics are correct, this is likely okay, but verifying this through tests or additional checks would enhance robustness.
  • Similarly, Polynomial::interpolate_fft::<FrField>(&ql_vals).unwrap() and others could panic if interpolation fails. Consider handling error gracefully.

Security

  • Timing Side-Channels: Ensure mathematical operations on FieldElement and involving secret data are constant-time to prevent side-channel attacks. This is not verifiable through a static code review without further context or implementation details.
  • Cryptographically Secure Randomness: The usage of TestRandomFieldGenerator as a randomness source should be scrutinized to ensure it's replaced with a cryptographically secure generator in production.
  • Proper Zeroization: No indication of explicit zeroization of sensitive data before they're deallocated.

Performance

  • The code uses vec![0; n * 3] to create the permutation vector, which may cause unnecessary memory allocation since only a subset of indices are modified. Consider using with_capacity to preallocate memory without initialization when only parts of the vector are initialized explicitly.

Bugs & Errors

  • The code ensures that n is at least 4 and a power of two. However, if any validations change in the future, ensure tests are retained to verify this.

Code Simplicity

  • Duplicated Code: The unit test sequence for different n values is repetitive. Consider using parameterized tests to avoid code duplication.

Overall, while the code logic seems generally sound, the areas noted with unwraps, unnecessary allocations, and potential branching could be refined or reconsidered to handle error cases more gracefully and improve security features in a production environment.

Consider addressing these issues before merging.

@jotabulacios
Copy link
Contributor

@greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

This PR implements significant performance optimizations to the PLONK prover by reducing cryptographic field operations:

Key Optimizations:

  • Round 2: Replaces n-1 individual field divisions with a single batch inversion using Montgomery's trick, reducing field inversions from O(n) to O(1)
  • Round 3: Skips FFT for vanishing polynomial Z_H(x) = x^n - 1 by exploiting that it has only 4 distinct values on a 4n-element coset
  • Round 3: Skips FFT for identity polynomial p_x = X by directly computing coset values as offset * ω^i
  • Global: Uses multiplication chains instead of pow() for small exponents throughout prover, verifier, and test utilities
  • Parallelization: Adds optional rayon-based parallel execution for FFT evaluations and commitment operations

Testing & Infrastructure:

  • Adds comprehensive benchmark suite using criterion for circuits of size 4096, 8192, and 16384 gates
  • Adds large circuit test utilities with tests at multiple sizes (n=16, 64, 256, 512, 4096)
  • All existing unit tests preserved and pass with optimized code

The optimizations are mathematically sound and leverage domain-specific properties of the PLONK protocol. However, previous review threads have identified critical assumptions (particularly degree == 4n) that are currently only enforced via debug_assert_eq!, meaning they could be violated in release builds if the degree formula changes.

Confidence Score: 4/5

  • This PR is safe to merge with minor risks related to assumption validation in release builds
  • Score reflects well-tested, mathematically sound optimizations with comprehensive test coverage, but previous review threads identified that critical assumptions (degree==4n) are only enforced via debug_assert_eq!, creating potential for silent correctness issues in release builds if degree formula changes. The optimizations themselves are correct when assumptions hold.
  • Pay close attention to crates/provers/plonk/src/prover.rs lines 572-576 where the Z_H optimization relies on degree==4n being enforced only via debug_assert_eq!

Important Files Changed

Filename Overview
crates/provers/plonk/src/prover.rs Major performance optimizations: batch inversion in round_2, direct Z_H/p_x evaluation in round_3, parallel FFT evaluations, and multiplication chains for small exponents. Previous threads identified critical assumptions about degree==4n that need verification.
crates/provers/plonk/src/verifier.rs Mirrors prover optimizations: precomputes zeta powers efficiently using multiplication chains instead of repeated pow() calls, and reuses alpha_squared. Straightforward performance improvements with no correctness concerns.
crates/provers/plonk/Cargo.toml Adds optional rayon dependency for parallelization, criterion for benchmarking, and defines the parallel feature flag. All changes are appropriate for the performance optimization work.
crates/provers/plonk/src/test_utils/circuit_large.rs New test utilities for generating large circuits by padding circuit_1 with identity gates. Includes comprehensive tests at multiple sizes (n=16, 64, 256, 512, 4096). Implementation is clean and well-tested.

Sequence Diagram

sequenceDiagram
    participant Prover
    participant Round1
    participant Round2
    participant Round3
    participant Round4
    participant Round5
    
    Prover->>Round1: Commit wire polynomials (a, b, c)
    Note over Round1: Parallel commitments (3x)
    Round1-->>Prover: a_1, b_1, c_1
    
    Prover->>Round2: Sample β, γ from transcript
    Prover->>Round2: Build permutation polynomial z
    Note over Round2: Batch inversion (n-1 denominators)
    Round2-->>Prover: z_1
    
    Prover->>Round3: Sample α from transcript
    Prover->>Round3: Compute quotient polynomial t
    Note over Round3: Parallel FFT evaluations (15 polys)<br/>Direct Z_H computation (4 values)<br/>Direct p_x generation (no FFT)
    Round3-->>Prover: t_lo_1, t_mid_1, t_hi_1
    
    Prover->>Round4: Sample ζ from transcript
    Prover->>Round4: Evaluate polynomials at ζ
    Note over Round4: Multiplication chains for zeta powers
    Round4-->>Prover: a_ζ, b_ζ, c_ζ, s1_ζ, s2_ζ, z_ζω
    
    Prover->>Round5: Sample υ from transcript
    Prover->>Round5: Generate opening proofs
    Note over Round5: Parallel batch opening
    Round5-->>Prover: w_ζ_1, w_ζω_1
Loading

@jotabulacios
Copy link
Contributor

Updated benchmarks results

Apple M1 8 cores with --features parallel

Circuit Size (n) Time (ms)
4096 812.19
8192 1503.3
16384 2728.1

AMD Ryzen 5 PRO 3600 (6-Core, 12 threads) with --features parallel

Circuit Size (n) Time (ms)
4096 599.14
8192 1116.1
16384 1987.7

@github-actions
Copy link

Kimi AI Review

Here are my specific comments on the PR:

  1. crates/provers/plonk/Cargo.toml:

    • Adding rayon for parallelism is a good optimization. However, ensure that the parallelized code does not introduce any data races or correctness issues. The Sync trait bounds added to various types seem reasonable for safety.
  2. crates/provers/plonk/benches/criterion_prover.rs:

    • This new benchmark file looks good for testing the performance of the PLONK prover. However, consider adding more comments explaining what each benchmark tests.
  3. crates/provers/plonk/src/constraint_system/mod.rs:

    • Change from line 193-207: The new code is more concise and avoids cloning FieldElement values unnecessarily. Looks good.
  4. crates/provers/plonk/src/constraint_system/solver.rs:

    • Change from line 19-35: The loop condition and iteration now use assignments.len() instead of assignments.keys().len(), which is more idiomatic Rust. Looks good.
  5. crates/provers/plonk/src/prover.rs:

    • Line 7: Adding Sync bounds to F and CS traits for parallelism is necessary. Good change.
    • Line 312-531: Many changes related to parallelizing polynomial evaluations and optimizations in the prover algorithm. The use of rayon for parallelism is done carefully by guarding with #[cfg(feature = "parallel")]. The batch inversion and coset FFT optimizations seem mathematically sound. However, ensure the parallel code is thoroughly tested for both correctness and performance.
    • Line 583-733: More changes related to parallelism and optimizations in the prover algorithm. Similar comments as above apply.
  6. crates/provers/plonk/src/test_utils/circuit_large.rs:

    • This new file defines a scalable test circuit by padding a smaller circuit with identity gates. Looks good and useful for testing.
  7. crates/provers/plonk/src/test_utils/mod.rs:

    • Adding circuit_large module is good. No issues seen.
  8. crates/provers/plonk/src/test_utils/utils.rs:

    • Line 34-71: Changes to use iterative multiplication instead of pow() for efficiency. Looks good.
  9. crates/provers/plonk/src/verifier.rs:

    • Line 326-411: Changes to precompute zeta powers and alpha squared for reuse. Looks good and improves efficiency.

Overall, the PR makes several optimizations for performance while maintaining correctness. The parallelism and batch operations could provide significant speedups. However, ensure thorough testing, especially for the parallelized code. The changes seem well-reviewed and should improve performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants