Narrow CI to 3 optimized runners and force AVX2 on Windows#305
Narrow CI to 3 optimized runners and force AVX2 on Windows#305
Conversation
Collapse the contiguous phase-40 through phase-43 debug commits into one checkpoint while preserving the net tree state before phase 44. Co-authored-by: Cursor <cursoragent@cursor.com>
* Instrument Windows asm failures and collect runtime evidence. Add targeted Windows probes and disassembly checks to localize crash offsets and failing asm paths. Co-authored-by: Cursor <cursoragent@cursor.com> * Fix Windows asm addressing to be ASLR-safe. Convert global and table references to RIP-relative forms for Windows asm generation paths. Co-authored-by: Cursor <cursoragent@cursor.com> * Stabilize Windows gcd_unsigned dispatch control flow. Correct dispatch indexing and use explicit compare/branch selection to avoid executing table data. Co-authored-by: Cursor <cursoragent@cursor.com> * Fix Windows CI probe exit-code handling. Ensure PowerShell helpers compare integer process exit codes so successful probes do not fail the workflow. Co-authored-by: Cursor <cursoragent@cursor.com> * Remove temporary Windows debug instrumentation. Drop crash-debug hooks and probes after validation, including the final vdf_fast cleanup. Co-authored-by: Cursor <cursoragent@cursor.com> * Align Windows optimized test coverage with other runners. Run the full optimized test set in the Windows PowerShell test step by removing ad-hoc iteration args and adding prover_test with fast mode. Co-authored-by: Cursor <cursoragent@cursor.com> * Clean up leftover no-op debug checks in fast path and tighten macOS-only branch selection in gcd_unsigned. This removes empty instrumentation cleanup blocks and keeps the dispatch path logic aligned with platform-specific behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * Restore Windows jump-table dispatch path in gcd_unsigned. This reverts an accidental macOS-only condition change from the prior cleanup commit that caused 1weso_test to crash on Windows CI. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
Tighten Windows asm/runtime plumbing and related docs/tests while removing stale duplicate include clutter from vdf headers. Co-authored-by: Cursor <cursoragent@cursor.com>
Include CHIA_WINDOWS in the avx512_add_table addressing branch so Windows emits LEA+ADD RIP-relative access instead of absolute table addressing. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop dead local state that was computed and immediately discarded to avoid implying a missing iteration guard. Co-authored-by: Cursor <cursoragent@cursor.com>
Use end_index instead of size in the jump-table compare to match the mapped index logic and avoid incorrect branch selection. Co-authored-by: Cursor <cursoragent@cursor.com>
… docs. This captures the current branch updates, including the TwoWesolowski position-locking fix and related CMake/parameter/readme adjustments for current CI work. Co-authored-by: Cursor <cursoragent@cursor.com>
Prevent `emu_hw_test` and `emu_hw_vdf_client` from being defined on Windows so CMake does not try to compile sources that depend on POSIX headers. Co-authored-by: Cursor <cursoragent@cursor.com>
Set forms_capacity when allocating FastAlgorithmCallback forms so all WesolowskiCallback subclasses consistently initialize capacity metadata for safe bounds checks. Co-authored-by: Cursor <cursoragent@cursor.com>
Add the same a_end_index range guard used on Linux before the CMP/JE chain so out-of-range values jump to the error path instead of falling through to a kernel label. Co-authored-by: Cursor <cursoragent@cursor.com>
Match the Makefile behavior so compile_asm emits Mach-O-compatible assembly on Intel macOS instead of falling back to Linux/ELF code paths. Co-authored-by: Cursor <cursoragent@cursor.com>
Allow asm_function Windows stack-arg loading to be toggled for internal call sites, and drop dead <excpt.h> from threading since no SEH constructs are used. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep Linux absolute addressing and restore macOS/Linux jump-dispatch parity with main, while making detached-thread fallback logging debug-only. Co-authored-by: Cursor <cursoragent@cursor.com>
This replies to Opus review feedback by fixing macOS gcd_unsigned end-index dispatch, aligning bounds checks across platforms and callbacks, enabling Windows AVX512 CI coverage, and removing unused asm/cast paths. Co-authored-by: Cursor <cursoragent@cursor.com>
Make prover form retrieval value-based and throw on thread-start failure so TwoWesolowski recursion preserves parallel proof generation instead of silently serializing work. Co-authored-by: Cursor <cursoragent@cursor.com>
Require OSXSAVE and XCR0 ZMM/opmask state in init_avx_flags() before enabling AVX-512 IFMA to prevent illegal-instruction crashes on unsupported OS configurations. Co-authored-by: Cursor <cursoragent@cursor.com>
…VX2 on Windows. Run a short 2weso warm-up before the full pass and cut vdf_bench iterations to 10,000 to speed CI while reproducing forced-AVX2 behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Gate every workflow job behind a false condition, keep only short 2weso test invocations in test workflow, and log explicit AVX512/AVX512-IFMA unsupported status alongside AVX2 logs. Co-authored-by: Cursor <cursoragent@cursor.com>
Restore macOS Intel, macOS ARM64, and Ubuntu optimized matrix execution while keeping the longer 2weso run commented out. Co-authored-by: Cursor <cursoragent@cursor.com>
Re-add windows-latest to the optimized matrix so CI again exercises all optimized=1 platforms. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the callback mutex with atomic publish/read coordination so proving threads avoid hot lock contention, while preserving safe handoff of newly computed checkpoints. Also remove CHIA_FORCE_AVX2 wiring in runtime detection, CI, and docs to return AVX2 selection to default detection behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve merge conflicts by keeping branch-intended AVX and two-weso callback behavior while incorporating latest main ancestry.
| const uint64_t power = done_iterations + i * k * l; | ||
| while (!stop_signal && !weso->IsPublished(power)) { | ||
| std::this_thread::yield(); | ||
| } |
There was a problem hiding this comment.
Missing stop_signal check causes crash in GetForm
High Severity
TwoWesolowskiProver::GetForm exits its spin-wait loop when stop_signal becomes true, then unconditionally calls weso->GetFormCopy(power). The new GetFormCopy throws std::runtime_error when IsPublished(power) is false. Since GenerateProof runs in a std::thread with no surrounding try-catch, this unhandled exception triggers std::terminate, crashing the entire process. The analogous code in vdf.h correctly checks stop_signal after the wait loop and returns early before calling GetFormCopy.
Additional Locations (1)
Increase square_asm benchmark iterations from 10k back to 1,000,000 on both Unix and Windows optimized jobs so benchmark runs remain meaningful for performance tracking. Co-authored-by: Cursor <cursoragent@cursor.com>
| if (disable_avx2) { | ||
| bAVX2.store(false, std::memory_order_relaxed); | ||
| } else if (force_avx2) { | ||
| bAVX2.store(true, std::memory_order_relaxed); |
There was a problem hiding this comment.
CHIA_FORCE_AVX2 removed despite PR claiming to use it
Medium Severity
The PR title and description state the goal is to "force AVX2 on Windows via CHIA_FORCE_AVX2=1", but this commit removes all support for CHIA_FORCE_AVX2 from parameters.h (the force_avx2 variable and its else if branch) and from the README.md. The env var is also never set in the CI workflow. The stated purpose of reproducing forced-AVX2 behavior in CI cannot be achieved with these changes.
Additional Locations (1)
Emit tagged machine-readable vdf_bench and AVX dispatch diagnostics, run repeated Windows square_asm benchmarks with warmup and summary artifacts, and temporarily reduce non-target tests to smoke levels to speed regression turnaround. Co-authored-by: Cursor <cursoragent@cursor.com>
Use brace-delimited interpolation for the temporary run progress string so the benchmark investigation step executes correctly on windows-latest. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| { | ||
| const char* value = std::getenv("CHIAVDF_PERF_TRACE"); | ||
| return value != nullptr && value[0] != '\0' && value[0] != '0'; | ||
| } |
There was a problem hiding this comment.
Redundant function duplicates existing helper with different behavior
Low Severity
The new perf_trace_enabled() in vdf_bench.cpp duplicates should_perf_trace() already available from the included parameters.h, but with subtly different logic. perf_trace_enabled() only rejects '0' as falsy, so values like "N", "no", or "false" are treated as true. env_flag() (used by should_perf_trace()) correctly treats 'n'/'N'/'f'/'F' as false. This inconsistency means the same CHIAVDF_PERF_TRACE env var could enable tracing in vdf_bench but not in init_avx_flags.


Summary
test.yamlto three optimized runners (macOS Intel, Ubuntu, Windows)2weso_test 100run before the full2weso_testvdf_benchiterations to 10,000 and force AVX2 on Windows viaCHIA_FORCE_AVX2=1Purpose
Reproduce and monitor forced-AVX2 behavior in CI while keeping cycle time short.
Made with Cursor
Note
Medium Risk
Touches concurrency-sensitive proof/checkpoint publishing logic (mutex removal + atomic ordering) and significantly changes CI/security signal by disabling multiple workflows.
Overview
CI is heavily narrowed/paused. Most GitHub Actions workflows (
build.yml,build-c-libraries.yml,build-riscv64.yml,rust.yml,codeql-analysis.yml,dependency-review.yml,check-commit-signing.yml,hw-build.yml,stale-issue.yml) now have jobs gated behindif: ${{ false }}, effectively disabling them.test.yamlis reduced to a small optimized-only matrix and tweaked for faster perf triage:2weso_testis run as a short smoke (10 iters),prover_testis always run in fast mode,vdf_benchload is reduced, and Windows gains a repeated benchmark harness that writes parsed metrics to the step summary and uploads them as artifacts.For perf investigation, a new
CHIAVDF_PERF_TRACEflag adds machine-readable tracing invdf_benchand extra AVX dispatch diagnostics ininit_avx_flags(). Separately,TwoWesolowskiCallbackremoves a mutex in favor of atomics with explicit publish/consume semantics, and consumers (TwoWesolowskiProver/ProveTwoWeso) now wait for checkpoints to be published before reading forms to avoid races.Written by Cursor Bugbot for commit 2c93d8e. This will update automatically on new commits. Configure here.