Move bytecode expansion into jolt-program and fix CSR-zero handling#1490
Move bytecode expansion into jolt-program and fix CSR-zero handling#1490moodlezoup merged 44 commits intoa16z:mainfrom
Conversation
|
Claude spec review session started: https://claude.ai/code/session_01C4HndRVK9e4cgFodersdgj |
|
Spec Analysis: Bytecode Expansion Crate
Status: Questions remain — 6 ambiguities to resolve before one-shot implementation. Questions: 1. [Goal] Is this PR a single-crate split or a four-crate split? The Acceptance Criteria require all four crates ( 2. [Goal] How does the 3. [Success Criteria] How is 4. [Constraints] What changes about the allocator's concurrency contract? Today 5. [Context] Does 6. [Success Criteria] Which
Generated by Claude Code |
Lock the crate split around jolt-program-preprocess as materialized program preprocessing. Clarify that PCS setup, commitment derivation, prover/verifier preprocessing keys, and bytecode commitment integration remain out of scope for this PR. Capture refactor/crates and bytecode commitment branches as future integration context.
|
@0xAndoroid another spec review round? |
|
Claude spec review session started: https://claude.ai/code/session_01DiZV7TFUjKjBXJHrMasDuW |
|
Spec Analysis: Bytecode Expansion Crate (Round 2)
Status: Approved — The spec is clear enough for one-shot implementation. What changed since round 1: All six prior questions are addressed substantively:
Summary of what will be built:
Key invariants:
Soft suggestions for the implementer (not blocking):
Next step: Run
Generated by Claude Code |
Lock the final JoltProgramPreprocessing name throughout the spec. Clarify that InstrAssembler should borrow the expansion allocator mutably and document the dependency-light fallback for compute_min_ram_K.
Refactor the crate split spec to strengthen the existing jolt-riscv crate instead of introducing jolt-riscv-instructions. Update dependency direction, workspace layout, file lists, verification commands, and implementation checklist so the PR adds three new crates while moving static instruction data below tracer into jolt-riscv.
|
Claude spec review session started: https://claude.ai/code/session_018HpUv9885oDLhRJ5RmCk5u |
|
Spec Analysis:
Status: Approved — The spec is clear enough for one-shot implementation. What changed since Round 2 (re-approved): The spec was restructured in two ways since the prior approval:
Both changes tighten the design rather than loosen it. The "minimum acceptable first pass" escape hatch from earlier drafts is gone — the spec now commits: "The implementation PR should land the full crate cutover described here rather than a partial first pass." Codebase verification:
Summary of what will be built:
Key invariants:
Soft suggestions for the implementer (not blocking):
Next step: Run
Generated by Claude Code |
| - [ ] The verifier-facing path does not depend on CPU execution, lazy tracing, memory-device emulation, advice I/O, or prover-only witness generation. | ||
| - [ ] `cargo tree -p tracer` shows `tracer` depending on `jolt-riscv` and `jolt-program`, but neither lower-level crate depending back on `tracer`. | ||
| - [ ] `cargo tree -p jolt-core --features host` does not require `tracer` solely to name program preprocessing or bytecode rows. | ||
| - [ ] Fixture consistency tests prove byte-for-byte or structure-for-structure parity with the current expansion output for supported RV32 and RV64 instruction families. |
There was a problem hiding this comment.
We should just fully drop RV32 support
| - `src/operands.rs`: own `NormalizedOperands` and operand accessors. | ||
| - `src/normalized.rs`: own `NormalizedInstruction`. | ||
| - `src/format/*.rs`: move instruction format structs from `tracer/src/instruction/format/`. | ||
| - `src/instructions/*.rs`: merge the concrete instruction structs and pure instruction metadata from `tracer/src/instruction/` with the existing Jolt instruction kind wrappers and flag declarations. |
There was a problem hiding this comment.
This is something I was explicitly avoiding in my recent refactor -- we should decide whether the instruction/cycle data structures should be concrete or abstract. I can see arguments for both sides
Frame bytecode expansion as a compiler lowering pipeline with explicit source, target, legality, and resource materialization phases. Document the near-term Rust-first boundary while keeping the grammar MLIR-ready for future dialect work.
Reject CSRRW/CSRRS rows with CSR address 0 instead of emitting a default normalized row that can be skipped by bytecode PC mapping. Remove the circular tracer expansion bridge test now that tracer delegates to jolt-program expansion, and update both bytecode expansion specs to document the CSR-zero correction and the non-circular testing strategy.
| let mut inline = INLINE { | ||
| opcode: metadata & 0x7f, | ||
| funct3: (metadata >> 7) & 0x7, | ||
| funct7: (metadata >> 10) & 0x7f, | ||
| address: instruction.address as u64, | ||
| operands: instruction.operands.into(), | ||
| virtual_sequence_remaining: instruction.virtual_sequence_remaining, | ||
| is_first_in_sequence: instruction.is_first_in_sequence, | ||
| is_compressed: instruction.is_compressed, | ||
| }; | ||
| inline.virtual_sequence_remaining = instruction.virtual_sequence_remaining; | ||
| inline.is_first_in_sequence = instruction.is_first_in_sequence; |
There was a problem hiding this comment.
| let mut inline = INLINE { | |
| opcode: metadata & 0x7f, | |
| funct3: (metadata >> 7) & 0x7, | |
| funct7: (metadata >> 10) & 0x7f, | |
| address: instruction.address as u64, | |
| operands: instruction.operands.into(), | |
| virtual_sequence_remaining: instruction.virtual_sequence_remaining, | |
| is_first_in_sequence: instruction.is_first_in_sequence, | |
| is_compressed: instruction.is_compressed, | |
| }; | |
| inline.virtual_sequence_remaining = instruction.virtual_sequence_remaining; | |
| inline.is_first_in_sequence = instruction.is_first_in_sequence; | |
| let inline = INLINE { | |
| opcode: metadata & 0x7f, | |
| funct3: (metadata >> 7) & 0x7, | |
| funct7: (metadata >> 10) & 0x7f, | |
| address: instruction.address as u64, | |
| operands: instruction.operands.into(), | |
| virtual_sequence_remaining: instruction.virtual_sequence_remaining, | |
| is_first_in_sequence: instruction.is_first_in_sequence, | |
| is_compressed: instruction.is_compressed, | |
| }; |
| for instruction in sequence { | ||
| if let Instruction::VirtualAdvice(advice) = instruction { | ||
| let Some(value) = values.get(filled) else { | ||
| return; |
| pub trait JoltInstruction: | ||
| Copy + Into<NormalizedInstruction> + TryFrom<NormalizedInstruction> | ||
| { |
There was a problem hiding this comment.
might as well add some helper methods here
| pub trait JoltInstruction: | |
| Copy + Into<NormalizedInstruction> + TryFrom<NormalizedInstruction> | |
| { | |
| pub trait JoltInstruction: | |
| Copy + Into<NormalizedInstruction> + TryFrom<NormalizedInstruction> | |
| { | |
| fn normalize(&self) -> NormalizeInstruction { | |
| (*self).into() | |
| } | |
| fn is_virtual(&self) -> bool { | |
| self.normalize().virtual_sequence_remaining.is_some() | |
| } |
| } | ||
| } | ||
|
|
||
| fn ram_access(access: RAMAccess) -> ProgramRamAccess { |
There was a problem hiding this comment.
now that we've flipped the dependency, tracer can use jolt_program::execution::RamAccess directly
| #[cfg(feature = "host")] | ||
| #[test] | ||
| fn normalized_flags_match_expanded_program_bytecode() { | ||
| let mut program = crate::host::Program::new("fibonacci-guest"); | ||
| let (bytecode, _, _, _) = program.decode(); | ||
|
|
||
| for normalized in bytecode { | ||
| let _circuit_flags = normalized.circuit_flags(); | ||
| let _instruction_flags = normalized.instruction_flags(); | ||
| } | ||
| } |
There was a problem hiding this comment.
this test doesn't do anything
| @@ -0,0 +1,117 @@ | |||
| use super::*; | |||
|
|
|||
| pub(super) fn noop_for(instruction: NormalizedInstruction) -> NormalizedInstruction { | |||
There was a problem hiding this comment.
can we add a debug_assert here that instruction.rd is zero?
| #[derive(Debug, Clone)] | ||
| pub struct ExpansionAllocator { | ||
| allocated: [bool; NUM_VIRTUAL_REGISTERS], | ||
| pending_clearing_inline: Vec<u8>, |
There was a problem hiding this comment.
let's add a doc comment to this field
There was a problem hiding this comment.
let's just fold this into assembler.rs
| } | ||
|
|
||
| pub(super) const fn has_side_effects(instruction_kind: InstructionKind) -> bool { | ||
| matches!( |
There was a problem hiding this comment.
I don't like this match statement; ideally "has side-effects" should be declared via the jolt_instruction! macro in jolt-riscv. This problem is downstream of the awkward relationship between InstructionKind and the individual instruction structs declared in jolt-riscv; I think we need to rethink it a bit.
| return Ok(vec![noop_for(*instruction)]); | ||
| } | ||
|
|
||
| match instruction.instruction_kind { |
There was a problem hiding this comment.
Similar for this match statement; ideally we'd be able to just do instruction.expand()
There was a problem hiding this comment.
and each instruction struct from jolt-riscv would implement a VirtualSequence trait
| Add(Add<NormalizedInstruction>), | ||
| Addi(Addi<NormalizedInstruction>), | ||
| Sub(Sub<NormalizedInstruction>), |
There was a problem hiding this comment.
afaict there's no reason to have both this JoltInstructions enum and InstructionKind.
We can add a generic to JoltInstructions:
pub enum JoltInstructions<T = ()> {
Noop,
Add(Add<T>),
Addi(Addi<T>),
Sub(Sub<T>),
// ....
}And then use JoltInstructions instead of InstructionKind everywhere.
Tighten bytecode PC mapper validation, make the expanded-bytecode flag test assert parity, and apply small API cleanups from review. Also ignore Lake build artifacts when embedding zklean templates so workspace clippy is not affected by local .lake directories.
Move jolt-program expansion implementations out of broad category files and into per-instruction modules, keeping shared lowering helpers in local shared modules. No expansion behavior changes intended.
|
Posted by Cursor assistant (model: GPT-5.4) on behalf of the user (Quang Dao) with approval. Thanks, went through these. The mechanical items are now addressed:
The remaining items seem broader than review-nit scope:
|
|
Posted by Cursor assistant (model: GPT-5.4) on behalf of the user (Quang Dao) with approval. Merged latest The important resolution detail: #1503 fixed the old tracer-local inline-sequence implementations for CSR/LR/SC, but this PR has moved expansion ownership into
Added focused Validation run after resolving the merge:
|
|
Posted by Cursor assistant (model: GPT-5.4) on behalf of the user (Quang Dao) with approval. Quick clarification on the relationship between the fixes after merging main/#1503. There are two independent CSR protections now, and they cover different boundaries:
So the tracer path catches unsupported CSR earlier, while the Separately, main/#1503 also added the LR/SC RAM-region fix: LR.W/LR.D/SC.W/SC.D expansions must assert Net result:
|
moodlezoup
left a comment
There was a problem hiding this comment.
This looks good to me. I do think we need to address the InstructionKind vs JoltInstructions issue, but I'm happy to do so in a follow-up
| let [(previous_sequence, _), (new_sequence, _)] = window else { | ||
| unreachable!("windows(2) always yields two entries"); | ||
| }; | ||
| if previous_sequence <= new_sequence { |
There was a problem hiding this comment.
to be more precise, this could be if new_sequence != previous_sequence - 1 right?
There was a problem hiding this comment.
fold this into normalized.rs
Summary
Moves bytecode expansion and verifier-critical program preprocessing out of the tracer-owned path and into the new
jolt-program/ strengthenedjolt-riscvboundary.This PR now includes implementation plus specs:
jolt-riscvowns the normalized instruction vocabulary, operands, kinds, static instruction metadata, and conversion marker traits without depending ontracer.jolt-programowns RV64 program-image decode, bytecode expansion, materialized program/RAM/bytecode preprocessing, and a backend-neutral execution contract.tracer, SDK host helpers, and proof setup call through those lower program-construction APIs instead of owning canonical expansion/preprocessing semantics.jolt-program::expandrejectsCSRRW/CSRRSwith CSR address0asUnsupportedCsr(0), replacing the historical default-row behavior that could produce an address-zero row skipped by bytecode PC mapping.jolt-programexpansion bridge test was removed because it became circular once tracer delegated expansion tojolt-program; remaining coverage uses direct expansion tests and non-circular decode/normalization checks.Specs
specs/bytecode-expansion-crate.mdto describe the implemented crate split, the CSR-zero behavior correction, and the post-cutover testing strategy.specs/compiler-native-bytecode-expansion.mdfor the next extraction-friendly rewrite: a compiler-native lowering core with typed recipes, explicit source/target legality, symbolic temp lifetimes, fixed buffers, and MLIR-ready concepts.Testing
cargo fmt -qcargo nextest run -p jolt-program --cargo-quietcargo nextest run -p tracer jolt_program_rv64_decode_matches_tracer_normalization --cargo-quietcargo clippy -p jolt-program --all-targets -q -- -D warningscargo clippy -p tracer --all-targets -q -- -D warningsgit diff --checkDisclosure: This PR description was updated by Codex, an AI assistant based on GPT-5, on behalf of Quang Dao.