Skip to content

Narrow CI to 3 optimized runners and force AVX2 on Windows#305

Open
hoffmang9 wants to merge 60 commits intomainfrom
avx2-force-ci-minimal-runners
Open

Narrow CI to 3 optimized runners and force AVX2 on Windows#305
hoffmang9 wants to merge 60 commits intomainfrom
avx2-force-ci-minimal-runners

Conversation

@hoffmang9
Copy link
Copy Markdown
Member

@hoffmang9 hoffmang9 commented Feb 13, 2026

Summary

  • keep test.yaml to three optimized runners (macOS Intel, Ubuntu, Windows)
  • add a short 2weso_test 100 run before the full 2weso_test
  • reduce vdf_bench iterations to 10,000 and force AVX2 on Windows via CHIA_FORCE_AVX2=1

Purpose

Reproduce and monitor forced-AVX2 behavior in CI while keeping cycle time short.

Made with Cursor


Note

Medium Risk
Touches concurrency-sensitive proof/checkpoint publishing logic (mutex removal + atomic ordering) and significantly changes CI/security signal by disabling multiple workflows.

Overview
CI is heavily narrowed/paused. Most GitHub Actions workflows (build.yml, build-c-libraries.yml, build-riscv64.yml, rust.yml, codeql-analysis.yml, dependency-review.yml, check-commit-signing.yml, hw-build.yml, stale-issue.yml) now have jobs gated behind if: ${{ false }}, effectively disabling them.

test.yaml is reduced to a small optimized-only matrix and tweaked for faster perf triage: 2weso_test is run as a short smoke (10 iters), prover_test is always run in fast mode, vdf_bench load is reduced, and Windows gains a repeated benchmark harness that writes parsed metrics to the step summary and uploads them as artifacts.

For perf investigation, a new CHIAVDF_PERF_TRACE flag adds machine-readable tracing in vdf_bench and extra AVX dispatch diagnostics in init_avx_flags(). Separately, TwoWesolowskiCallback removes a mutex in favor of atomics with explicit publish/consume semantics, and consumers (TwoWesolowskiProver/ProveTwoWeso) now wait for checkpoints to be published before reading forms to avoid races.

Written by Cursor Bugbot for commit 2c93d8e. This will update automatically on new commits. Configure here.

hoffmang9 and others added 30 commits February 6, 2026 18:38
Collapse the contiguous phase-40 through phase-43 debug commits into one checkpoint while preserving the net tree state before phase 44.

Co-authored-by: Cursor <cursoragent@cursor.com>
hoffmang9 and others added 16 commits February 11, 2026 19:25
* Instrument Windows asm failures and collect runtime evidence.

Add targeted Windows probes and disassembly checks to localize crash offsets and failing asm paths.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fix Windows asm addressing to be ASLR-safe.

Convert global and table references to RIP-relative forms for Windows asm generation paths.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Stabilize Windows gcd_unsigned dispatch control flow.

Correct dispatch indexing and use explicit compare/branch selection to avoid executing table data.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fix Windows CI probe exit-code handling.

Ensure PowerShell helpers compare integer process exit codes so successful probes do not fail the workflow.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Remove temporary Windows debug instrumentation.

Drop crash-debug hooks and probes after validation, including the final vdf_fast cleanup.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Align Windows optimized test coverage with other runners.

Run the full optimized test set in the Windows PowerShell test step by removing ad-hoc iteration args and adding prover_test with fast mode.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Clean up leftover no-op debug checks in fast path and tighten macOS-only branch selection in gcd_unsigned.

This removes empty instrumentation cleanup blocks and keeps the dispatch path logic aligned with platform-specific behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Restore Windows jump-table dispatch path in gcd_unsigned.

This reverts an accidental macOS-only condition change from the prior cleanup commit that caused 1weso_test to crash on Windows CI.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Tighten Windows asm/runtime plumbing and related docs/tests while removing stale duplicate include clutter from vdf headers.

Co-authored-by: Cursor <cursoragent@cursor.com>
Include CHIA_WINDOWS in the avx512_add_table addressing branch so Windows emits LEA+ADD RIP-relative access instead of absolute table addressing.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop dead local state that was computed and immediately discarded to avoid implying a missing iteration guard.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use end_index instead of size in the jump-table compare to match the mapped index logic and avoid incorrect branch selection.

Co-authored-by: Cursor <cursoragent@cursor.com>
… docs.

This captures the current branch updates, including the TwoWesolowski position-locking fix and related CMake/parameter/readme adjustments for current CI work.

Co-authored-by: Cursor <cursoragent@cursor.com>
Prevent `emu_hw_test` and `emu_hw_vdf_client` from being defined on Windows so CMake does not try to compile sources that depend on POSIX headers.

Co-authored-by: Cursor <cursoragent@cursor.com>
Set forms_capacity when allocating FastAlgorithmCallback forms so all WesolowskiCallback subclasses consistently initialize capacity metadata for safe bounds checks.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add the same a_end_index range guard used on Linux before the CMP/JE chain so out-of-range values jump to the error path instead of falling through to a kernel label.

Co-authored-by: Cursor <cursoragent@cursor.com>
Match the Makefile behavior so compile_asm emits Mach-O-compatible assembly on Intel macOS instead of falling back to Linux/ELF code paths.

Co-authored-by: Cursor <cursoragent@cursor.com>
Allow asm_function Windows stack-arg loading to be toggled for internal call sites, and drop dead <excpt.h> from threading since no SEH constructs are used.

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep Linux absolute addressing and restore macOS/Linux jump-dispatch parity with main, while making detached-thread fallback logging debug-only.

Co-authored-by: Cursor <cursoragent@cursor.com>
This replies to Opus review feedback by fixing macOS gcd_unsigned end-index dispatch, aligning bounds checks across platforms and callbacks, enabling Windows AVX512 CI coverage, and removing unused asm/cast paths.

Co-authored-by: Cursor <cursoragent@cursor.com>
Make prover form retrieval value-based and throw on thread-start failure so TwoWesolowski recursion preserves parallel proof generation instead of silently serializing work.

Co-authored-by: Cursor <cursoragent@cursor.com>
Require OSXSAVE and XCR0 ZMM/opmask state in init_avx_flags() before enabling AVX-512 IFMA to prevent illegal-instruction crashes on unsupported OS configurations.

Co-authored-by: Cursor <cursoragent@cursor.com>
…VX2 on Windows.

Run a short 2weso warm-up before the full pass and cut vdf_bench iterations to 10,000 to speed CI while reproducing forced-AVX2 behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
@hoffmang9 hoffmang9 changed the base branch from main to nudupl-ci-windows-runner February 13, 2026 18:38
hoffmang9 and others added 3 commits February 13, 2026 11:09
Gate every workflow job behind a false condition, keep only short 2weso test invocations in test workflow, and log explicit AVX512/AVX512-IFMA unsupported status alongside AVX2 logs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Restore macOS Intel, macOS ARM64, and Ubuntu optimized matrix execution while keeping the longer 2weso run commented out.

Co-authored-by: Cursor <cursoragent@cursor.com>
Re-add windows-latest to the optimized matrix so CI again exercises all optimized=1 platforms.

Co-authored-by: Cursor <cursoragent@cursor.com>
Base automatically changed from nudupl-ci-windows-runner to main February 13, 2026 19:28
hoffmang9 and others added 2 commits February 13, 2026 15:36
Replace the callback mutex with atomic publish/read coordination so proving threads avoid hot lock contention, while preserving safe handoff of newly computed checkpoints. Also remove CHIA_FORCE_AVX2 wiring in runtime detection, CI, and docs to return AVX2 selection to default detection behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve merge conflicts by keeping branch-intended AVX and two-weso callback behavior while incorporating latest main ancestry.
const uint64_t power = done_iterations + i * k * l;
while (!stop_signal && !weso->IsPublished(power)) {
std::this_thread::yield();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing stop_signal check causes crash in GetForm

High Severity

TwoWesolowskiProver::GetForm exits its spin-wait loop when stop_signal becomes true, then unconditionally calls weso->GetFormCopy(power). The new GetFormCopy throws std::runtime_error when IsPublished(power) is false. Since GenerateProof runs in a std::thread with no surrounding try-catch, this unhandled exception triggers std::terminate, crashing the entire process. The analogous code in vdf.h correctly checks stop_signal after the wait loop and returns early before calling GetFormCopy.

Additional Locations (1)

Fix in Cursor Fix in Web

Increase square_asm benchmark iterations from 10k back to 1,000,000 on both Unix and Windows optimized jobs so benchmark runs remain meaningful for performance tracking.

Co-authored-by: Cursor <cursoragent@cursor.com>
if (disable_avx2) {
bAVX2.store(false, std::memory_order_relaxed);
} else if (force_avx2) {
bAVX2.store(true, std::memory_order_relaxed);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHIA_FORCE_AVX2 removed despite PR claiming to use it

Medium Severity

The PR title and description state the goal is to "force AVX2 on Windows via CHIA_FORCE_AVX2=1", but this commit removes all support for CHIA_FORCE_AVX2 from parameters.h (the force_avx2 variable and its else if branch) and from the README.md. The env var is also never set in the CI workflow. The stated purpose of reproducing forced-AVX2 behavior in CI cannot be achieved with these changes.

Additional Locations (1)

Fix in Cursor Fix in Web

hoffmang9 and others added 2 commits February 13, 2026 17:01
Emit tagged machine-readable vdf_bench and AVX dispatch diagnostics, run repeated Windows square_asm benchmarks with warmup and summary artifacts, and temporarily reduce non-target tests to smoke levels to speed regression turnaround.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use brace-delimited interpolation for the temporary run progress string so the benchmark investigation step executes correctly on windows-latest.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

{
const char* value = std::getenv("CHIAVDF_PERF_TRACE");
return value != nullptr && value[0] != '\0' && value[0] != '0';
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant function duplicates existing helper with different behavior

Low Severity

The new perf_trace_enabled() in vdf_bench.cpp duplicates should_perf_trace() already available from the included parameters.h, but with subtly different logic. perf_trace_enabled() only rejects '0' as falsy, so values like "N", "no", or "false" are treated as true. env_flag() (used by should_perf_trace()) correctly treats 'n'/'N'/'f'/'F' as false. This inconsistency means the same CHIAVDF_PERF_TRACE env var could enable tracing in vdf_bench but not in init_avx_flags.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant