Differences between CPU and GPU compilation regarding rounding floats #228

149segolte · 2025-04-09T18:21:46Z

149segolte
Apr 9, 2025

When using Rust-GPU/cargo-gpu and generating a project to compile for both CPU and GPU there are difference in how same code behaves. For example:

fn main() {
    let x = 1743028480i32;
    let y = (x as f64) * (1.0 / (i32::MAX as f64));
    
    println!("{} {}", x, (y * (i32::MAX as f64)) as i32);
}

This produces 1743028480 1743028479 on the CPU due to rounding of 1743028479.9999997769 to 1743028479. But on GPU it rounds up to 1743028480. Is this a difference in SPIRV's casting? Am I trying to write consistent code between the two and these numbers are used in RNG seeds therefore even that much of a difference leads to wildly different numbers in the depending code.
Is there a way to preform the casting the rust way when using rust-gpu?

Answered by Firestar99

Apr 11, 2025

In general, floating point between two different machines must not always be the same. Nowadays we generally do no-fast-math on the CPU and have the compiler emit exactly the floating point operations we write, at the expense of fewer optimizations but better accuracy across compilers and machines. But even that excludes complex fp operations such as sin or cos, which must only be accurate to some degree.
On the GPU it's an entirely different story again. The default is fast-math, optimization of fp as much as possible, merge multiply adds into FMAs and even the hardware itself may respect the rounding mode or may just not do any rounding at all, also denormals are not supported, all in t…

View full answer

LegNeato · 2025-04-09T22:50:48Z

LegNeato
Apr 9, 2025
Maintainer

Rust:

Rust’s built-in conversion using as truncates the fractional part, effectively rounding toward zero.
For example, 1743028479.999... becomes 1743028479.
Rust Language Reference

SPIR-V:

SPIR-V uses the OpConvertFToS instruction for converting floating-point values to integers. Here is where it happens in rust-gpu.
It rounds using round-to-nearest-even, which can round 1743028479.999... up to 1743028480.
SPIR-V Specification

fn main() {
    let x = 1743028480i32;
    let y = (x as f64) * (1.0 / (i32::MAX as f64));

    // Without explicit rounding, differs:
    let without = (y * (i32::MAX as f64)) as i32;
    // Using `trunc()` forces Rust’s truncation:
    let with = (y * (i32::MAX as f64)).trunc() as i32;

    println!("x = {}. Without rounding = {}. With trunc() = {}.", x, without, with);
}

1 reply

149segolte Apr 10, 2025
Author

I get that. But the issue is when compiling it gets optimized away, I guess? I tried trunc() but it did not help.

Code for CPU gives 1743028479:

#![allow(dead_code)]
#![allow(unused)]
#![allow(incomplete_features)]
#![feature(generic_const_exprs)]

mod data;
mod world_gen;

use data::random::DspRandom;

pub fn main() {
    let index = 515;
    let mut rand = DspRandom::new(index as i32);
    let x = rand.next_f64();    // This has value 0.811660886188811
    println!("{}", (x * (i32::MAX as f64)).trunc() as u32);
}

Code for GPU gives 1743028480 for index = 515:

#![no_std]
#![allow(dead_code)]
#![allow(unused)]
#![allow(incomplete_features)]
#![feature(generic_const_exprs)]

use glam::UVec3;
use spirv_std::num_traits::Float;
use spirv_std::spirv;

mod data;
mod world_gen;

use data::random::DspRandom;

const MAX_SEEDS: usize = 1_000;

#[spirv(compute(threads(64, 1, 1)))]
pub fn compute_shader(
    #[spirv(global_invocation_id)] global_invocation_id: UVec3,
    #[spirv(storage_buffer, descriptor_set = 0, binding = 0)] output: &mut [u32],
) {
    let index = global_invocation_id.x as usize;

    if index < MAX_SEEDS {
        let mut rand = DspRandom::new(index as i32);
        let x = rand.next_f64();
        output[index] = (x * (i32::MAX as f64)).trunc() as u32;
    }
}

LegNeato · 2025-04-10T00:39:55Z

LegNeato
Apr 10, 2025
Maintainer

Oh hmm...can you upload the spir-t dump somewhere? https://github.com/Rust-GPU/rust-gpu/blob/ac0c7035d53ae0bf87fbff12cc8ad4e6f6628834/docs/src/codegen-args.md#--dump-spirt-passes-dir

2 replies

149segolte Apr 10, 2025
Author

GitHub might truncate the file when displaying, use the download zip to get the full files.
gist link

LegNeato Apr 10, 2025
Maintainer

Hmmm, I see the trunc() there.

LegNeato · 2025-04-10T20:23:03Z

LegNeato
Apr 10, 2025
Maintainer

I know @schell and @Firestar99 have done a fair amount of code running on CPU and GPU, I wonder if they have had to work around this.

5 replies

Firestar99 Apr 11, 2025
Maintainer

In general, floating point between two different machines must not always be the same. Nowadays we generally do no-fast-math on the CPU and have the compiler emit exactly the floating point operations we write, at the expense of fewer optimizations but better accuracy across compilers and machines. But even that excludes complex fp operations such as sin or cos, which must only be accurate to some degree.
On the GPU it's an entirely different story again. The default is fast-math, optimization of fp as much as possible, merge multiply adds into FMAs and even the hardware itself may respect the rounding mode or may just not do any rounding at all, also denormals are not supported, all in the name of speed.
If you want precision, use f64, and suffer from it being 32-64 times slower, but at least you get proper IEEE float!

Or you can try to play around with SPV_KHR_float_controls, which I don't think we support yet, and hope nothing within the compiler chain screws up.

In general using floating point as an RNG seed sounds like a really bad idea if you need determinism. Rather use ints and some math to RNG a float within [0,1) and go from there. Feel free to reference my GpuRng

Answer selected by 149segolte

149segolte Apr 11, 2025
Author

Yup. I do find that to be a better approach. But my current project is a port of another and I have to mirror the RNG generation. I will look into float controls and see if it works out.
Thanks for the help. @Firestar99

LegNeato Apr 11, 2025
Maintainer

FWIW, I haven't tried it myself, but in the future it looks like rand has no_std support and uses libm (which we replace when building on spirv with intrinsics).

Firestar99 Apr 12, 2025
Maintainer

True, but note my top comment on why rand may not be the best choice:

An PCG PRNG that is optimized for GPUs, in that it is fast to evaluate and accepts sequential ids as it's initial state without sacrificing on RNG quality.

schell Apr 18, 2025
Maintainer

Indeed I've found that numbers come out different on CPU and GPU, but I haven't been surprised by it yet. I have a lot of code that does GPU things on the CPU (like cubemap sampling) and so I expect to have to use a certain amount of fudging when doing comparisons.

I think if any computations must be perfectly identical at every step on CPU and GPU they probably shouldn't be doing floating point. It's a tough constraint but masking the differences would be quite a tricky bit of magic, and would likely be frustrating when trying to get around it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences between CPU and GPU compilation regarding rounding floats #228

{{title}}

Replies: 3 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Differences between CPU and GPU compilation regarding rounding floats #228

149segolte Apr 9, 2025

Replies: 3 comments · 8 replies

LegNeato Apr 9, 2025 Maintainer

149segolte Apr 10, 2025 Author

LegNeato Apr 10, 2025 Maintainer

149segolte Apr 10, 2025 Author

LegNeato Apr 10, 2025 Maintainer

LegNeato Apr 10, 2025 Maintainer

Firestar99 Apr 11, 2025 Maintainer

149segolte Apr 11, 2025 Author

LegNeato Apr 11, 2025 Maintainer

Firestar99 Apr 12, 2025 Maintainer

schell Apr 18, 2025 Maintainer

149segolte
Apr 9, 2025

Replies: 3 comments 8 replies

LegNeato
Apr 9, 2025
Maintainer

149segolte Apr 10, 2025
Author

LegNeato
Apr 10, 2025
Maintainer

149segolte Apr 10, 2025
Author

LegNeato Apr 10, 2025
Maintainer

LegNeato
Apr 10, 2025
Maintainer

Firestar99 Apr 11, 2025
Maintainer

149segolte Apr 11, 2025
Author

LegNeato Apr 11, 2025
Maintainer

Firestar99 Apr 12, 2025
Maintainer

schell Apr 18, 2025
Maintainer