This release focuses on performance:
- Execution was rewritten for much better performance and to support our distributed proving architecture.
- All proving after execution, including trace generation, is now supported on Nvidia GPUs through new CUDA kernels and Rust bindings.
Users are recommended to upgrade all guest and host crates to openvm v1.4.0
.
See CHANGELOG.md for more details.
What's Changed
- feat: make merkle tree finalization parallel by @Golovanov399 in #1701
- fix(new-execution): return segments in
execute_metered
by @shuklaayush in #1702 - refactor: Move
Streams
intoVmStateMut
by @nyunyunyunyu in #1707 - fix(new-execution): remove hardcoded trace heights by @shuklaayush in #1715
- feat: Do not build a
BTreeMap
when we don't need to by @Golovanov399 in #1718 - feat: new execution e4 for memory adapters by @Golovanov399 in #1733
- feat: use memmap instead of paged_vecs on platforms that support it by @pjabbarzade in #1734
- chore: merge
main
branch by @jonathanpwang in #1740 - chore: add serde to segment by @luffykai in #1749
- fix(new-execution): cleanup
vm.rs
by @shuklaayush in #1751 - ci: add benches for
execute_metered
by @shuklaayush in #1752 - chore(new-execution): clk/cycles -> instret/insns by @shuklaayush in #1755
- chore: merge
main
by @jonathanpwang in #1759 - feat: generalize E3 with generic
RecordArena
by @jonathanpwang in #1761 - chore: add execution insn/s logging by @jonathanpwang in #1764
- fix(new-execution): prevent segmentation in single segment executor by @shuklaayush in #1766
- fix(new-execution): use
StdRng
for deterministic execution by @shuklaayush in #1769 - perf(new-execution): use page-based approach for merkle tree metering by @shuklaayush in #1770
- fix(ci): make codspeed instrumentation dispatch only by @shuklaayush in #1771
- feat: RecordArena implementation on DenseRecordArena for variable length records by @arayikhalatyan in #1775
- chore(ci): fix REF_HASH calculation by @jonathanpwang in #1776
- feat(sdk): derive Debug traits in config by @Qumeric in #1777
- chore: make testing ProgramDummyAir mod public by @stephenh-axiom-xyz in #1781
- chore: make RANGE_CHECKER_BUS pub by @stephenh-axiom-xyz in #1783
- chore: add 'allocated' method to DenseRecordArena by @arayikhalatyan in #1785
- chore: fix last chore by @arayikhalatyan in #1786
- refactor(new-execution): make
ExecutionCtrl
s stateless by @shuklaayush in #1789 - fix: proper memory access functions in
execute_e1_impl
in the native poseidon chip by @Golovanov399 in #1790 - chore: use tracing spans for metrics by @jonathanpwang in #1791
- chore: make memory volatile and persistent boundary chips pub by @stephenh-axiom-xyz in #1792
- refactor: use PagedVec for TracingMemory metadata storage by @jonathanpwang in #1793
- feat(new-execution): change default
PAGE_BITS
value in e2 to 6 by @shuklaayush in #1794 - fix(new-execution): no segmentation in single segment executor by @shuklaayush in #1796
- ci(new-execution): run codspeed instrumentation on push by @shuklaayush in #1797
- feat(new-execution): ignore register addr space for memory ops by @shuklaayush in #1798
- fix(new-execution): remove duplicate assignment by @shuklaayush in #1799
- fix: account for E1 / E2 execution in metrics summary by @yi-sun in #1800
- fix(new-execution): disable loads from address space 1 by @shuklaayush in #1802
- fix(new-execution): adapter offset should be boundary_idx + 1 by @shuklaayush in #1803
- chore(ci): fix the instance type for codspeed benches by @jonathanpwang in #1811
- chore: update
stark-backend
commit by @jonathanpwang in #1812 - feat: extract_layout implementation for native Poseidon2 by @stephenh-axiom-xyz in #1813
- chore: make plonky3 nightly-features optional by @jonathanpwang in #1818
- feat: modify fri records + tests by @arayikhalatyan in #1819
- fix: native Poseidon2 record size should be in bytes instead of u32 by @stephenh-axiom-xyz in #1821
- refactor(new-execution): optimize E1/E2 implementation by @nyunyunyunyu in #1827
- fix(openvm-prof): replace unwraps with better error strings by @shuklaayush in #1833
- feat: update architecture for generic
ProverBackend
s by @jonathanpwang in #1836 - feat: use cached max trace heights in leaf/internal aggregation by @stephenh-axiom-xyz in #1839
- fix(new-execution): add register contribtions on
reset_segment
by @shuklaayush in #1843 - chore: merge
main
by @jonathanpwang in #1844 - feat: Add pubs needed for gpu by @Golovanov399 in #1860
- feat: RecordSeeker::get_aligned_record_size by @arayikhalatyan in #1863
- chore: General Sys phantom by @nyunyunyunyu in #1867
- fix(new-execution): don't pass total_widths to execute_metered by @shuklaayush in #1868
- feat: public sha2 trace by @matejav in #1881
- feat(new-execution): optimize algebra extension e1/e2 execution by @shuklaayush in #1882
- fix(new-execution): use smaller fib number in
verify_stark
test by @shuklaayush in #1893 - chore: move record arena implementation to separate file by @jonathanpwang in #1894
- fix(new-execution): assign bls12_381 output properly by @shuklaayush in #1895
- chore: Optimistic E3 Execution for Poseidon2 Chip by @nyunyunyunyu in #1896
- perf: remove
step
fromProgram
by @jonathanpwang in #1897 - chore: use explicit
Result<_, ExecutionError>
andpre_compute*
returnsStaticProgramError
by @jonathanpwang in #1898 - chore(perf): use constant for address space in
tracing_read
when possible by @jonathanpwang in #1899 - fix(new-execution): replace segmentation strategy with segmentation limits by @shuklaayush in #1901
- perf(new-execution): use unchecked ops in e2 by @shuklaayush in #1902
- feat: add host configurable memory cell type to
MemoryConfig
by @jonathanpwang in #1903 - fix: native
FriReducedOpeningRecord
size calculation by @teokitan in #1905 - chore: make
SdkVmConfig::to_inner
pub by @jonathanpwang in #1906 - chore:
PairingProverExt
is vacuous by @jonathanpwang in #1907 - chore: clean up benchmarks by @jonathanpwang in #1908
- fix(metrics): cycle tracker spans recorded only with "perf-metrics" by @jonathanpwang in #1910
- chore(tracing): trace individual AIR tracegen by @jonathanpwang in #1911
- feat:
generate_app_proof
always verifies segment proofs by @jonathanpwang in #1912 - chore: remove
DenseRecordArena::current_size
by @jonathanpwang in #1913 - feat:
execute_metered
returns final memory by @jonathanpwang in #1914 - feat: add "stark-debug" feature by @jonathanpwang in #1915
- fix(docs): update isa spec to specify alignment requirements by @shuklaayush in #1917
- fix(new-execution): always inline
execute_impl
by @shuklaayush in #1919 - fix(new-execution): fetch segments properly in kitchen_sink by @shuklaayush in #1921
- refactor(new-execution): make
VmSegmentState
a superset ofVmState
by @shuklaayush in #1922 - perf(new-execution): cache constants in pairing
final_exp_hint
by @shuklaayush in #1923 - feat:
InterpretedInstance
holds pre-computed handlers by @jonathanpwang in #1924 - chore: rename segment.rs to state.rs by @jonathanpwang in #1926
- chore: add
AddressSpaceHostConfig::size
fn for clarity by @jonathanpwang in #1927 - feat(new-execution): add
execute_preflight
benchmark for leaf verification by @shuklaayush in #1928 - feat: switch docs into vocs by @yi-sun in #1929
- fix: setup e12 should also write to match e3 state by @luffykai in #1930
- chore: rename execution traits by @jonathanpwang in #1932
- feat(new-execution): add leaf verifier execute/execute_metered benches by @shuklaayush in #1933
- perf(new-execution): use const generics for memory block_size and align by @shuklaayush in #1935
- fix(perf-metrics): perf instrumentation slow by @jonathanpwang in #1936
- chore: update snark-verifier to v0.2.3 by @Qumeric in #1938
- feat: Field zero initialization by @arayikhalatyan in #1939
- feat: allow to override compiler's Rust toolchain by @Qumeric in #1940
- perf(new-execution): optimize tracing memory functions by @shuklaayush in #1941
- fix(new-execution): revert "chore: update snark-verifier to v0.2.3 (#1938)" by @shuklaayush in #1943
- ci(new-execution): use proper jemalloc settings for execute benchmarks by @shuklaayush in #1944
- feat: Parse struct name in complex field + ECC init macros as string by @jonathanpwang in #1945
- chore: cleanup metrics by @jonathanpwang in #1947
- chore: remove an old
assert!(address_space < NATIVE_AS)
by @Golovanov399 in #1948 - chore: rename
**Step
to**Executor
by @jonathanpwang in #1949 - chore: return memory tests back by @Golovanov399 in #1950
- chore: rename
JalLuiCoreRecord
for consistency by @jonathanpwang in #1951 - chore(metrics): instrument
total_proof_time_ms
directly by @jonathanpwang in #1952 - chore: update Cargo.lock for alloy-eips v1.0.24 by @jonathanpwang in #1954
- chore(new-execution): make executors stateless by @jonathanpwang in #1955
- chore(new-execution): add warning on hint_random use by @shuklaayush in #1956
- feat:
VmInstance
struct to hold program-specific setup by @jonathanpwang in #1957 - fix(new-execution): rename trait/structs referencing e1/e2/e3 by @shuklaayush in #1958
- feat(new-execution): add execution mode for metering cost by @shuklaayush in #1959
- feat: consolidate tests by @arayikhalatyan in #1960
- chore: log number of instructions executed by @jonathanpwang in #1961
- feat(sdk): update interfaces by @jonathanpwang in #1962
- feat(new-execution): halt on cost going above threshold by @shuklaayush in #1963
- chore: make
set_public_values
pub by @jonathanpwang in #1964 - perf:
AddressMap
withmemmap
use madvise for zeroing by @jonathanpwang in #1966 - refactor(new-execution): move execute fns to execution.rs by @shuklaayush in #1968
- feat: consolidate system tests by @arayikhalatyan in #1969
- fix(new-execution): avoid double counting
insns
across execution modes by @shuklaayush in #1970 - chore(benchmark): add
segment_max_cells
to benchmark CLI args by @jonathanpwang in #1977 - fix(new-execution): warn if
max_constraint_degree
differ by @shuklaayush in #1979 - feat(cli): add openvm version to STARK/EVM proof jsons by @jonathanpwang in #1986
- feat: add
--evm
flag forcargo openvm setup
and bump versions by @yi-sun in #1987 - feat: add clean install prereqs for Ubuntu and Mac by @yi-sun in #1988
- chore: move some logging to debug level by @jonathanpwang in #1991
- feat(cli): add
--verbose
mode by @jonathanpwang in #1994 - docs(changelog): document change in vk binary format by @jonathanpwang in #1998
- ci(bench): tune jemalloc conf for execute benchmarks by @jonathanpwang in #1999
- docs: update changelog on halo2 verifier contract by @jonathanpwang in #2001
- feat: Fibonacci AIR CUDA trace generation by @stephenh-axiom-xyz
- feat: RangeChecker tracegen by @gaxiom
- feat: RangeTupleChecker tracegen by @bdiehs
- ci: Update tracegen test by @gaxiom
- feat: less_than SubAir CUDA trace generation and tests by @stephenh-axiom-xyz
- feat: added is-zero subrow trace generation in CUDA and tests by @matejav
- feat: BitwiseOperationLookupChip CUDA trace generation and tests by @stephenh-axiom-xyz
- feat: encoder SubAir CUDA tracegen + tests by @stephenh-axiom-xyz
- feat: auipc tracegen GPU by @gaxiom
- feat: Unified trace access by @gaxiom
- feat: is_equal subrow_gen and is_equal_array subrow_gen by @matejav
- chore: Reorganize axiom-gpu repository by @stephenh-axiom-xyz
- chore: versions update to feat/new-execution-e4 by @gaxiom
- feat: auipc chip & test by @gaxiom
- feat: deviceBuffer fills zero by @gaxiom
- ci: delete some ci processes until e4 will be ready by @gaxiom
- chore: Poseidon2 constants init access by @gaxiom
- chore: new-execution-e4 > new-execution by @gaxiom
- fix: all columns should be filled in IsEqualArray tracegen by @stephenh-axiom-xyz
- feat: Poseidon2 subair CUDA tracegen + tests by @matejav
- fix: write array indexing by @gaxiom
- fix: use aux_len for MemoryWriteAuxAdapter by @teokitan
- feat: GPU tracegen test harness by @stephenh-axiom-xyz
- feat: system Poseidon2 GPU tracegen and buffer by @stephenh-axiom-xyz
- feat: cuda tracegen + tests for Rv32HintStore by @arayikhalatyan
- feat: rv32im-jalr-tracegen and tests by @matejav
- feat: rv32im MUL adapter + chip tracegen by @teokitan
- feat: rv32im-jal-lui tracegen + tests by @matejav
- feat: cuda tracegen+tests for rv32 loadstore and load_sign_extend by @arayikhalatyan
- feat: rv32im less_than + ALU adapter/chip + shift tracegen by @teokitan
- feat: branch lt tracegen and branch eq tracegen + tests for both by @matejav
- feat: rv32im-mulh-tracegen + tests by @matejav
- feat: cuda tracegen + tests for divrem by @arayikhalatyan
- chore: lint workflow CI by @stephenh-axiom-xyz
- fix: VariableRangeChecker number of bins should be buffer length by @stephenh-axiom-xyz
- feat: cuda tracegen + tests for native castf and branch eq by @arayikhalatyan
- fix: write actual values to rv32im MUL test by @teokitan
- feat: native field_arithmetic CUDA tracegen by @teokitan
- feat: GPU tracegen for persistent and volatile boundary chip by @stephenh-axiom-xyz
- feat: jal_lui fix by @matejav
- chore: cargo update by @arayikhalatyan
- feat: native Poseidon2 GPU chip and tracegen by @stephenh-axiom-xyz
- feat: cuda tracegen + tests for native FRI by @arayikhalatyan
- feat: rv32 eq_mod adapter + algebra is_eq tracegen by @teokitan
- feat: Keccak tracegen on GPU by @gaxiom
- feat: system PhantomChip CUDA tracegen by @stephenh-axiom-xyz
- feat: system ProgramChip cached + common tracegen by @stephenh-axiom-xyz
- feat: system PublicValuesChip tracegen by @stephenh-axiom-xyz
- feat: hybrid primitive chips by @stephenh-axiom-xyz
- feat: native loadstore + loadstore_native_adapter working tracegen by @teokitan
- fix: MemoryManager double removal by @gaxiom
- feat: memory access adapters tracegen by @Golovanov399
- feat: native field extension by @arayikhalatyan
- feat: native
jal_rangecheck
tracegen by @teokitan - feat: bigint-tracegen by @matejav
- feat: mod-builder GPU tracegen by @teokitan
- fix: replace
add_range
in rv32im BaseAlu withadd_xor
by @teokitan - feat: algebra + ecc CUDA tracegen by @teokitan
- feat: update GPU to new traits by @jonathanpwang
- chore: update README with CUDA dev setup and other housekeeping by @jonathanpwang
- feat: native integration tests by @teokitan
- fix: reset stateful chips after tracegen by @jonathanpwang
- fix:
DeviceBuffer::fill_zero_suffix
and make ProgramChip stateless by @jonathanpwang - feat: bigint integration tests by @matejav
- feat: (temporary) SHA256 hybrid
VmProverExtension
by @jonathanpwang - fix: FriReducedOpening kernel missing index by local_idx by @jonathanpwang
- feat: GPU SDK by @jonathanpwang
- fix: less_than C++ operator precedence by @jonathanpwang
- fix: JALR cuda kernel missing memory initialization by @jonathanpwang
- fix: keccak padding edge case by @jonathanpwang
- feat(backend): turn on logup chunking for GPU engine by @jonathanpwang
- chore: Remove outdated comments by @Golovanov399
- fix: build script compiles on multi-gpu machine by @jonathanpwang
- feat: add support for riscv test vectors on gpu by @arayikhalatyan
- feat(perf): Update Poseidon2 Tracegen to be compatible with the latest execution by @nyunyunyunyu
- fix: unify mod-builder use of
BigUintGpu
andOverflowInt
+ various bug fixes by @teokitan - feat: ff_derive, p256, k256, pairing, verify_stark integration tests by @arayikhalatyan
- chore: CUDA debug flags by @gaxiom
- chore: CUDA_DEBUG & parallel build by @gaxiom
- feat: merkle tracegen on gpu + fix a bunch of stuff + make persistent memory inventory on gpu fully work by @Golovanov399
- chore: update openvm and
_init!
macro files by @jonathanpwang - fix: replace 29 with
timestamp_max_bits
in access adapters by @Golovanov399 - fix(metrics): SystemInventoryGpu updates system cells used by @jonathanpwang
- feat: sha256 cuda tracegen by @matejav
- feat: added do_write=false tests for jalr extension by @arayikhalatyan
- feat: use backend poseidon2 in Poseidon2Buffer by @stephenh-axiom-xyz
- chore: cleanup function signatures by @Golovanov399
- chore: bump versions on Solidity SDK book docs to v1.3 by @yi-sun in #1875
- fix(test): install cli with
--locked
by @jonathanpwang in #1877 - chore: add DeepWiki link by @yi-sun in #1883
- feat: release v1.4.0-rc.0 by @yi-sun in #1887
- chore: add Cantina report for v1.4.0 by @yi-sun in #1984
- perf: execution and tracegen rewrite by @jonathanpwang in #1567
- ci: fix nextest --run-ignored test by @jonathanpwang in #2004
- docs: snapshot docs in S3 on new tag by @jonathanpwang in #2011
- feat(ci): add benches for internal verifier execution by @shuklaayush in #2003
- fix: unused
aggregate_leaf_proofs
warning ingenerate-fixtures
by @shuklaayush in #2012 - chore: bump workspace to
v1.4.0-rc.5
by @jonathanpwang in #2015 - fix(cli): force regenerate STARK aggregation keys by @yi-sun in #2022
- feat: add GitHub merge queue configuration for INT-4814 by @devin-ai-integration[bot] in #2017
- chore: reduce warn to info for degree inequality by @yi-sun in #2026
- chore: add missing safety comments by @shuklaayush in #2019
- feat: print message upon verification success by @yi-sun in #2023
- chore: do not download k=24 halo2 params by @yi-sun in #2025
- fix: add merge_group triggers to critical workflows for merge queue by @devin-ai-integration[bot] in #2027
- ci: update runson repository configuration by @jonathanpwang in #2036
- fix(ci): remove
nvcc
from preinstall by @jonathanpwang in #2037 - chore(ci): remove
g4dn
from runner family by @jonathanpwang in #2038 - feat(nightly): execution
become
s faster by @jonathanpwang in #2013 - feat: update docs for SDK updates by @yi-sun in #1990
- feat: tracegen on GPU by @jonathanpwang in #2034
- chore: make encode slice pub by @luffykai in #2040
- feat: add metered execution commands to sdk/cli by @shuklaayush in #2049
- chore: bump stark-backend with CUDA compatibility with target-cpu=native by @jonathanpwang in #2057
- docs: add new security section by @yi-sun in #2048
- chore: cuda code cleanup by @Golovanov399 in #2051
- ci: implement merge queue controller workflow by @jonathanpwang in #2060
- ci: resolve PR blocking by merge queue status check by @jonathanpwang in #2062
- fix: remove halo2 warning by @jonathanpwang in #2061
- chore: bump workspace to
v1.4.0-rc.7
by @jonathanpwang in #2063 - ci: fix merge queue concurrency by @jonathanpwang in #2064
- ci(runs-on): use single label syntax by @jonathanpwang in #2068
- chore: bump workspace to
v1.4.0-rc.8
and add"cuda"
to CLI by @jonathanpwang in #2069 - ci(benchmarks): add job index by @jonathanpwang in #2073
- chore: mention even hex message in cargo cli by @Golovanov399 in #2065
- ci: remove merge-queue and consolidate CUDA base tests by @jonathanpwang in #2074
- docs: add Rust toolchain section by @yi-sun in #2046
- docs: add domain redirects by @yi-sun in #2071
- ci: run only zkVM ruint tests on gpu by @jonathanpwang in #2075
- docs(book): add installation instructions for CLI with cuda by @jonathanpwang in #2070
- ci: use spot gpu runners by @jonathanpwang in #2076
- docs(specs): new section on distributed proving by @jonathanpwang in #2077
- fix: redirects with slashes by @yi-sun in #2079
- chore: bump workspace to
1.4.0-rc.9
and stark-backend tag tov1.2.0-rc.7
by @gaxiom in #2078 - fix(docs): small fix to circuit arch doc by @shuklaayush in #2084
- docs: CUDA and non-CUDA contributor setup doc by @stephenh-axiom-xyz in #2081
- docs: use P256Point instead of P256Affine in book by @stephenh-axiom-xyz in #2083
- docs(book): add installation instructions for tco by @jonathanpwang in #2072
- docs: update VM crate documentation by @shuklaayush in #2080
- docs: add execution record docs by @Golovanov399 in #2085
- docs: SDK standard test with p256 + doc update by @stephenh-axiom-xyz in #2066
- fix: update about this book section by @yi-sun in #2088
- ci: speed up sdk test with workflow matrix by @jonathanpwang in #2089
- docs: resolve various incorrect/missing info issues in book by @stephenh-axiom-xyz in #2087
- fix(docs): add details about execution modes by @shuklaayush in #2093
- ci(gpu): use less
test-threads
by @jonathanpwang in #2092 - docs: fix + update book links by @stephenh-axiom-xyz in #2094
- feat: use hybrid CPU tracegen for modular, complex, and ECC extensions by @jonathanpwang in #2086
- feat: remove unnecessary run options from cargo openvm verify stark by @stephenh-axiom-xyz in #2095
- fix: explicit panic on oob memory access by @shuklaayush in #2096
- chore: bump workspace to
v1.4.0-rc.10
by @jonathanpwang in #2097 - docs: update for new VM extension traits by @jonathanpwang in #2090
- chore(readme): update link names by @jonathanpwang in #2099
- docs: Assorted book clarifications and improvements by @stephenh-axiom-xyz in #2098
- ci: use different runners by @jonathanpwang in #2100
- chore(deps): bump tracing-subscriber from 0.3.19 to 0.3.20 by @dependabot[bot] in #2101
- chore: update new-extension docs according to the current openvm by @Golovanov399 in #2103
- chore: add the openvm tags everywhere in the docs by @Golovanov399 in #2104
- chore: update input doc with an example by @shayanh in #2102
- feat: support v1.2/v1.3 guests by @lispc in #2091
- chore: remove
openvm::entry!(main)
from examples by @Golovanov399 in #2105 - chore(cli): CLI version command display enabled special features [cuda] by @jonathanpwang in #2109
- chore(release): v1.4.0 by @jonathanpwang in #2110
New Contributors
Full Changelog: v1.3.0...v1.4.0