Improve filesystem timestamp granularity check #1899

EliahKagan · 2025-03-20T00:04:06Z

This avoids false positives for the check of whether we can proceed to run the filesystem snapshot journey test, by making a hundred files about as fast as we can using ordinary non-parallelized techniques, then checking if all their timestamps are distinct.

This carries a small risk of false negatives, but experiments on macOS, where the journey test is typically able to run, suggest that the rate of false negatives will be low. (False positives are also, in theory, possible, but should be extremely rare or never happen.)

Distinct timestamp experiment

First, I experimented outside of CI on Ubuntu, Windows, and macOS, initially with the code shown in #1897 (review), but after that with this test program:

use std::io::Result;
use std::path::Path;
use std::time::{Instant, SystemTime};

fn main() -> Result<()> {
    let reps = 50;
    let n = 100;

    let mut uniques = Vec::new();
    for _ in 0..reps {
        uniques.push(run_experiment(n)?);
    }
    println!("{:?}", uniques);
    println!("{}, {}", uniques.contains(&1), uniques.contains(&n));
    Ok(())
}

fn run_experiment(n: u32) -> Result<u32> {
    let start = Instant::now();
    let tmp = tempfile::tempdir().unwrap();
    let times = unique_times(n, tmp.path())?;
    let finish = Instant::now();
    println!("{} / {}", times.len(), n);
    println!("{times:?}");
    println!("Experiment took {:?}.", finish - start);
    Ok(times.len().try_into().expect("it's out of n, which is u32"))
}

fn unique_times(n: u32, root: &Path) -> Result<Vec<SystemTime>> {
    let names: Vec<_> = (0..n).map(|i| format!("{i:03}")).collect();
    for name in &names {
        std::fs::write(root.join(name), name)?;
    }
    let mut times = Vec::new();
    for name in names {
        times.push(root.join(name).symlink_metadata()?.modified()?);
    }
    times.sort();
    times.dedup();
    Ok(times)
}

Running that locally, the last two lines of output on Ubuntu were:

[4, 4, 4, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 4, 4, 5, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 4, 3, 4, 4, 3, 3, 4, 4, 3, 3, 4, 4, 4, 4, 4, 3]
false, false

And on Windows (which has 100 ns representation granularity, and which uses a temporary directory that is not on a separate tmpfs-like filesystem):

[25, 25, 25, 24, 24, 25, 26, 25, 24, 24, 25, 26, 26, 24, 26, 25, 25, 24, 24, 23, 25, 25, 24, 23, 23, 24, 24, 26, 28, 26, 24, 23, 24, 23, 23, 23, 24, 23, 24, 25, 23, 23, 24, 24, 24, 30, 25, 26, 24, 23]                                        false, false

And on macOS, which has 1 ns granularity at least on the current default filesystem:

[100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100]                                                                                                              false, true

For full details, including full output, see this repository.

This is not guaranteed to be that way on all such systems. The use of a different filesystem type or device type could potentially confer high granularity on a system that ordinarily doesn't have it, or result in diminished granularity on a system where it ordinarily is high. If it were guaranteed, then actually checking this would be overkill because it would be sufficient to enable the test on macOS but not other operating systems. That is expected to be what happens on CI as it is currently set up, as well as in most local environments, but not necessarily in all local environments. Therefore:

I think this is better than assuming which platforms work, though it might make sense to have CI fail if the check returns false on macOS.
Before considering this ready, I'm going to do that at least temporarily, to check that my non-CI macOS experiments are similar to CI (i.e. to make sure the test it least sometimes actually running). That's why this is still a draft.

Edit: I've done that. The passing test with this macOS+CI specific assertion when repeating the whole test 250 times can be seen in this run.
Although I've marked this as fixing the issue, it might be better--even if merging this--to view it as a workaround in the hope of developing a better fix later, more along the lines of what you suggested in #1897 (comment). That way, the test could run on all platforms we test on CI, and most other platforms anyone would want to run it on, too.

On the other hand, even then, it might be considered valuable to avoid sleeping during the test if the granularity is high enough that it isn't necessary. (I don't mean to avoid the delay, but rather with the idea that the test might be slightly more robust.) Also, the technique used here should be possible to adapt to infer the timestamp granularity.

CI experiment

The actual fix here is in d4a8d79, but I also changed the test so it runs 250 times instead of once (including separate calls to the helper function each time) before that commit, and changed it back afterwards. This was to make sure that the reason I didn't see failures was that this really does eliminate the failures (or at least make them extremely rare), rather than by chance, since the failures did not always happen before.

This workflow output on my fork shows that the tests passed even before it was changed back from running 250 times to one time. Please note also that this 250-times repetition is of the whole test--it is not related to the "inner" repetition of creating 100 files.

This is to smoke out nondeterministic failures on any platforms that have them, so as to gain greater confidence that the forthcoming fix is effective. The two major matrix job definitions in `ci.yml` are accordingly made so a failing job does not cancel the other jobs, so all jobs that would fail can get as far as to show their failures. This is likewise temporary.

This avoids false positives for the check of whether we can proceed to run the filesystem snapshot `journey` test, by making a hundred files about as fast as we can using ordinary non-parallelized techniques, then checking if all their timestamps are distinct. This carries a small risk of false negatives, but experiments on macOS, where the `journey` test is typically able to run, suggest that the rate of false negatives will be low. (False positive are also, in theory, possible, but should be extremely rare or never happen.)

This raises the risk of a failure when there is no bug, but hopefully only slightly, and it's a simple way to verify that the snapshot `journey` test really is running fully in at least one regularlly tested platform.

Byron

Thanks a million, this is great!

I have a few nits with one suggestion related to performance that may be worth more consideration.
Besides that, please feel free to merge when you think it's good to go.

gix-fs/tests/fs/snapshot.rs

- Decrease n=100 to n=50, which remains decisively enough on all platforms being tested (including Windows). - Slightly streamline iteration by cloning the first iterator so its values need not be collected into a `Vec`. - Make it so that if the CI+macOS high-granularity assertion ever fails, the message explains the significance of the failing `assert_eq!` comparison. (This condenses an added comment and makes it an assertion message instead.) Suggested-by: Sebastian Thiel <[email protected]>

Co-authored-by: Sebastian Thiel <[email protected]>

This reverts commit bf45023.

EliahKagan · 2025-03-20T02:09:36Z

I'm waiting on tests in https://github.com/EliahKagan/gitoxide/actions/runs/13960418689, which verify that the technique still works under the changes (via outer repetition of the whole test, something we don't want to do at the tip of main and which is thus reverted in the next commit). Assuming that still goes well, I'll merge or enable auto-merge.

Edit: 🚀

Byron · 2025-03-20T02:58:39Z

Thanks again for all your help with this! It's huge issue that I introduced and couldn't easily fix, but you resolved it in the shortest possible time 🙏.

EliahKagan · 2025-03-20T04:24:29Z

Thanks! I hope this will not dissuade you from replacing the approach added here with something that would let the test run everywhere, if that can be done without making it work any less well where it already can run. Or from encouraging me to do so, if you have thoughts about how, beyond (or in more detail on) the idea in #1897 (comment).

Byron · 2025-03-20T04:54:03Z

The test was added only because I strangely was missing one, but younger me might have predicted the difficulty in doing so and skipped it. Now that we went through the trouble, I am happy there is a test running at all, even though it won't do so right now on all platforms.
Thus, for now I think it's good enough and a clear improvement to what was (or wasn't) there before, so it's unlikely I get to work in it anytime soon.

Here is my gitoxide backlog, and that's just what I started but didn't finish. There is more bugs hidden in issues that I have to fix with priority.

EliahKagan added 3 commits March 19, 2025 18:41

Assert that macOS on CI is high granularity

d1b0d1c

This raises the risk of a failure when there is no bug, but hopefully only slightly, and it's a simple way to verify that the snapshot `journey` test really is running fully in at least one regularlly tested platform.

EliahKagan force-pushed the run-ci/granularity branch from b21c6c0 to 434bab7 Compare March 20, 2025 00:19

EliahKagan mentioned this pull request Mar 20, 2025

improve detection of nanosecond support in gix-fs (#1896) #1897

Merged

EliahKagan marked this pull request as ready for review March 20, 2025 00:23

Byron approved these changes Mar 20, 2025

View reviewed changes

gix-fs/tests/fs/snapshot.rs Outdated Show resolved Hide resolved

gix-fs/tests/fs/snapshot.rs Outdated Show resolved Hide resolved

gix-fs/tests/fs/snapshot.rs Outdated Show resolved Hide resolved

EliahKagan force-pushed the run-ci/granularity branch from 434bab7 to 33333a5 Compare March 20, 2025 01:55

EliahKagan and others added 2 commits March 19, 2025 22:02

Apply further review suggestion

37aeb2f

Co-authored-by: Sebastian Thiel <[email protected]>

Revert "Temporarily repeat the fs snapshot journey many times"

a5c58dd

This reverts commit bf45023.

EliahKagan force-pushed the run-ci/granularity branch from 33333a5 to a5c58dd Compare March 20, 2025 02:03

EliahKagan enabled auto-merge March 20, 2025 02:23

EliahKagan merged commit 091d994 into GitoxideLabs:main Mar 20, 2025
21 checks passed

EliahKagan deleted the run-ci/granularity branch March 20, 2025 02:26

EliahKagan mentioned this pull request Mar 20, 2025

Should include tempdir creations and recursive removals in per-experiment timings EliahKagan/granularity#1

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve filesystem timestamp granularity check #1899

Improve filesystem timestamp granularity check #1899

Uh oh!

EliahKagan commented Mar 20, 2025 •

edited

Loading

Uh oh!

Byron left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EliahKagan commented Mar 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Byron commented Mar 20, 2025

Uh oh!

EliahKagan commented Mar 20, 2025

Uh oh!

Byron commented Mar 20, 2025

Uh oh!

Uh oh!

Uh oh!

Improve filesystem timestamp granularity check #1899

Improve filesystem timestamp granularity check #1899

Uh oh!

Conversation

EliahKagan commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Distinct timestamp experiment

CI experiment

Uh oh!

Byron left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EliahKagan commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Byron commented Mar 20, 2025

Uh oh!

EliahKagan commented Mar 20, 2025

Uh oh!

Byron commented Mar 20, 2025

Uh oh!

Uh oh!

EliahKagan commented Mar 20, 2025 •

edited

Loading

EliahKagan commented Mar 20, 2025 •

edited

Loading