[CUDAQ-790] `cc.device_call` lowering to realtime dispatch for local simulators by 1tnguyen · Pull Request #4565 · NVIDIA/cuda-quantum

1tnguyen · 2026-05-21T03:54:08Z

Support lowering of cc.device_call to realtime-based ring-buffer dispatch.

cc.device_call is lowered to three ABI calls that operate on a transport-neutral frame lease - a leased RX/TX slot pair owned by the runtime for the duration of one RPC:

  __cudaq_device_call_acquire_realtime_frame  // lease an in-flight frame
  __cudaq_device_call_dispatch_realtime_frame // publish + wait
  __cudaq_device_call_safely_release_realtime_frame

The compiler writes function arguments directly into the leased RX payload and reads results directly from the leased TX payload.

Scalars are stored at aligned offsets in the request slot
std::vector<T> args use a length prefix followed by element bytes packed in place.
Output std::vector<T>& arguments (for CUDA-QX interop) are signalled via an attribute on cc.device_call and read back through the same zero-copy response slot.

The runtime side (runtime/internal/device_call/) wraps cudaq_ringbuffer_t from CUDA-Q Realtime: a singleton driver routes per-device sessions to a DeviceCallChannel, with two built-in shared-memory channels: device_dispatch (persistent GPU dispatch kernel) and host_dispatch (graph-launch with a pinned mailbox).

Notes:

This PR handles local shared-memory only; no process separation or RDMA transport in this branch. This is deferred to a follow-up.
CUDA-Q builds without realtime by default; the path is opt-in via CUDAQ_REALTIME_DIR at configure time (points to a realtime installation) and
CI here is build-only: realtime_integration_ci.yml compiles a realtime-enabled CUDA-Q against the realtime installation.

Tested by:

DeviceCallDispatchTester.cu: test runtime ring-buffer / channel behavior
End-to-end NVQPP/device_call_realtime_{scalar,array}.cpp tests of shared-memory and host-dispatch channels for different arguments and return type with simulators in the loop.

Lower CUDA-Q device_call operations to realtime RPC buffers and dispatch through service-backed shared-memory transports. Add runtime channels, GPU and host dispatch support, compiler lowering, nvq++ integration, and focused tests for the core realtime device_call path. Keep helper APIs internal and use generic flat-array arguments. TCP/IP transport support is split to follow-up branch tnguyen/device-call-realtime-tcp. Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

github-actions · 2026-05-21T04:45:19Z

CI Summary (`push`) — ✅ passed

Run #26483004932 · ✅ 6 · ⏩ 7 · ❌ 0 · ⛔ 0

Top-level jobs (13)

Job	Result
`binaries`	⏩ skipped
`build_and_test`	✅ success
`config_devdeps`	✅ success
`config_source_build`	⏩ skipped
`config_wheeldeps`	✅ success
`devdeps`	✅ success
`docker_image`	⏩ skipped
`gen_code_coverage`	⏩ skipped
`metadata`	✅ success
`python_metapackages`	⏩ skipped
`python_wheels`	⏩ skipped
`source_build`	⏩ skipped
`wheeldeps`	✅ success

⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch

Job
`binaries`
`config_source_build`
`docker_image`
`gen_code_coverage`
`python_metapackages`
`python_wheels`
`source_build`

All sub-jobs (42) — every matrix leg, with links

Job	Status	Link
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python)	✅ success	view
Build and test (amd64, llvm, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (amd64, llvm, openmpi) / Dev environment (Python)	✅ success	view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (arm64, llvm, openmpi) / Dev environment (Python)	✅ success	view
CI Summary	❔ in_progress	view
Configure build (devdeps)	✅ success	view
Configure build (source_build)	⏩ skipped	view
Configure build (wheeldeps)	✅ success	view
Create CUDA Quantum installer	⏩ skipped	view
Create Docker images	⏩ skipped	view
Create Python metapackages	⏩ skipped	view
Create Python wheels	⏩ skipped	view
Gen code coverage	⏩ skipped	view
Load dependencies (amd64, gcc12) / Caching	✅ success	view
Load dependencies (amd64, gcc12) / Finalize	✅ success	view
Load dependencies (amd64, gcc12) / Metadata	✅ success	view
Load dependencies (amd64, llvm) / Caching	✅ success	view
Load dependencies (amd64, llvm) / Finalize	✅ success	view
Load dependencies (amd64, llvm) / Metadata	✅ success	view
Load dependencies (arm64, gcc12) / Caching	✅ success	view
Load dependencies (arm64, gcc12) / Finalize	✅ success	view
Load dependencies (arm64, gcc12) / Metadata	✅ success	view
Load dependencies (arm64, llvm) / Caching	✅ success	view
Load dependencies (arm64, llvm) / Finalize	✅ success	view
Load dependencies (arm64, llvm) / Metadata	✅ success	view
Load source build cache	⏩ skipped	view
Load wheel dependencies (amd64, 12.6) / Caching	✅ success	view
Load wheel dependencies (amd64, 12.6) / Finalize	✅ success	view
Load wheel dependencies (amd64, 12.6) / Metadata	✅ success	view
Load wheel dependencies (amd64, 13.0) / Caching	✅ success	view
Load wheel dependencies (amd64, 13.0) / Finalize	✅ success	view
Load wheel dependencies (amd64, 13.0) / Metadata	✅ success	view
Load wheel dependencies (arm64, 12.6) / Caching	✅ success	view
Load wheel dependencies (arm64, 12.6) / Finalize	✅ success	view
Load wheel dependencies (arm64, 12.6) / Metadata	✅ success	view
Load wheel dependencies (arm64, 13.0) / Caching	✅ success	view
Load wheel dependencies (arm64, 13.0) / Finalize	✅ success	view
Load wheel dependencies (arm64, 13.0) / Metadata	✅ success	view
Prepare cache clean-up	❔ in_progress	view
Retrieve PR info	✅ success	view

✅ Required checks (6/6) — declared in .github/required-checks.yml for push

Required check	Status	Link
Build and test (amd64, llvm, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (amd64, llvm, openmpi) / Dev environment (Python)	✅ success	view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (arm64, llvm, openmpi) / Dev environment (Python)	✅ success	view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python)	✅ success	view

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

1tnguyen and others added 2 commits May 21, 2026 03:43

Merge branch 'main' into tnguyen/device-call-realtime

f755060

Fix CI formatting

501b47b

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

1tnguyen force-pushed the tnguyen/device-call-realtime branch 3 times, most recently from f80c892 to f2baa1c Compare May 22, 2026 03:33

Test CI

f193f4c

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

1tnguyen force-pushed the tnguyen/device-call-realtime branch from f2baa1c to f193f4c Compare May 22, 2026 04:00

1tnguyen and others added 11 commits May 24, 2026 23:23

Fix mac CI

e79fd8b

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Merge branch 'main' into tnguyen/device-call-realtime

44ef7a3

Fix build issue

41c6c49

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Fix build issue

baccb87

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Fix build issue

ed26e7a

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Fix build issue

6f42477

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Fix CI issue

c508c74

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Fix CI issue

d6aedac

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Fix CI issue

ffd7d7a

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Fix CI issue

96d60b1

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Add a validation

df31ca7

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

1tnguyen changed the title ~~[WIP] cc.device_call lowering to realtime dispatch for local simulators~~ [CUDAQ-790] cc.device_call lowering to realtime dispatch for local simulators May 26, 2026

1tnguyen marked this pull request as ready for review May 26, 2026 18:34

1tnguyen requested review from bettinaheim, bmhowe23 and mitchdz as code owners May 26, 2026 18:34

1tnguyen requested review from cketcham2333, schweitzpgi and taalexander May 26, 2026 18:35

bmhowe23 reviewed May 26, 2026

View reviewed changes

Comment thread realtime/CMakeLists.txt Outdated

Code review: remove unnecessary CMake version file

b20e088

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDAQ-790] `cc.device_call` lowering to realtime dispatch for local simulators#4565

[CUDAQ-790] `cc.device_call` lowering to realtime dispatch for local simulators#4565
1tnguyen wants to merge 16 commits into
NVIDIA:mainfrom
1tnguyen:tnguyen/device-call-realtime

1tnguyen commented May 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

1tnguyen commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Summary (push) — ✅ passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1tnguyen commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

CI Summary (`push`) — ✅ passed