Skip to content

improve detection of nanosecond support in gix-fs (#1896) #1897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 19, 2025
Merged

Conversation

Byron
Copy link
Member

@Byron Byron commented Mar 19, 2025

Fixes #1896.

@Byron Byron enabled auto-merge March 19, 2025 00:58
@Byron Byron merged commit 77dbd4b into main Mar 19, 2025
21 checks passed
@Byron Byron deleted the fix-ci branch March 19, 2025 01:31
Copy link
Member

@EliahKagan EliahKagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even when the filesystem has an effective precision of 1 millisecond (whatever its actual precision of representation) -- that is, when its timestamps are fairly coarse -- has_nanosecond_times returns true a significant minority of the time. But successive operations in the journey test then still often have the same recorded timestamp.

Returning true on such a system might sometimes be due to occasional longer-than-usual delays. But I think it is instead due to the operations straddling a representable time change: a is written, then the clock reaches a point that will round or truncate as a different number in the form representable in the filesystem, then b is written. I believe this is sufficient to explain the continued test failures noted in #1896 (comment).

(The same explanation applies equally to the behavior of the code before the changes from this PR. subsec_nanos() is not related to whether we have high precision, so long as the precision is higher than one second--it is nonzero whenever there is a nonzero sub-second portion of the duration measured--and I believe the change here did not really make a difference.)

Quantifying the effect

First, this code, in which the definition of has_nanosecond_times is copied from here, usually outputs 20 / 1000 or higher when run in the Rust playground:

use std::path::Path;

fn main() {
    let max = 1000;
    let mut count = 0;
    for _ in 1..=max {
        let tmp = tempfile::tempdir().unwrap();
        if has_nanosecond_times(tmp.path()).unwrap() {
            count += 1;
        }
    }
    println!("{count} / {max}");
}

fn has_nanosecond_times(root: &Path) -> std::io::Result<bool> {
    let test_file = root.join("nanosecond-test");

    std::fs::write(&test_file, "a")?;
    let first_time = test_file.metadata()?.modified()?;

    std::fs::write(&test_file, "b")?;
    let second_time = test_file.metadata()?.modified()?;

    Ok(second_time.duration_since(first_time).is_ok_and(|d|
            // This can be falsely false if a filesystem would be ridiculously fast,
            // which means a test won't run even though it could. But that's OK, and unlikely.
            d.subsec_nanos() != 0))
}

Examining the timings

Then I changed the program to this:

use std::path::Path;

fn main() {
    let max = 1000;
    let nonzeros: Vec<_> = (0..max)
        .map(|_| {
            let tmp = tempfile::tempdir().unwrap();
            nanoseconds(tmp.path()).unwrap()
        })
        .filter(|time| *time != 0)
        .collect();
    println!("{:?}", nonzeros);
    println!("{} / {}", nonzeros.len(), max);
}

fn nanoseconds(root: &Path) -> std::io::Result<u32> {
    let test_file = root.join("nanosecond-test");

    std::fs::write(&test_file, "a")?;
    let first_time = test_file.metadata()?.modified()?;

    std::fs::write(&test_file, "b")?;
    let second_time = test_file.metadata()?.modified()?;

    Ok(second_time
        .duration_since(first_time)
        .unwrap()
        .subsec_nanos())
}

Here's a typical output:

[1000019, 1000019, 1000019, 1000019, 1000019, 1000019, 1000019, 1000020, 1000020, 1000020, 1000020, 1000020, 1000020, 1000020, 1000019, 1000020, 1000020, 1000020, 1000020, 1000021, 1000020, 1000020, 1000020, 1000020, 1000019, 1000020, 1000020]
27 / 1000

It is similar when I run it locally on a GNU/Linux system.

I also tried running it on Windows:

[999600, 1002000, 997400, 999300, 1002100, 999600, 998400, 998900, 1000600, 999500, 998100, 1001000, 998400, 261100, 514000, 999600, 997300, 1002500, 999700, 1001200, 995300, 999900, 1001600, 997800, 999300, 1000000, 1000200, 999300, 1000200, 1002500, 998100, 999400, 999800, 1000100, 1002800, 999100, 1000000, 1000100, 999700, 1000100, 1000100, 999900, 1000700, 999700, 1001900, 998300, 999400, 1000100, 999800, 1000100, 1000000, 1000600, 999500, 999300, 999900, 1000600, 999900, 1000100, 1002300, 997500, 999500, 999900, 1000100, 999800, 620900, 711300, 527800, 665900, 984800, 995300, 1000700, 961400, 999500, 1000800, 1000600, 999600, 998800, 1000100, 1000000, 999700, 1000400, 1000900, 999300, 999900, 999900, 1001900, 998300, 999800, 1000000, 1000000, 1000800, 999600, 999600, 1000700, 999600, 999800, 999700, 999800, 1000400, 997400, 1000200, 1000600, 999200, 1000000, 999900, 1000700, 999300, 1000600, 999600, 999800, 1000000, 1000700, 999300, 999700, 1003400, 999700, 1000000, 1001000, 999300, 999900, 999500, 1001800, 998300, 1000700, 999900, 1000800, 1000000, 999300, 999400, 999600, 999700, 1000000, 1000000, 1000200, 1001000, 999300, 999900, 999100, 1000200, 1036900, 964700, 998300, 999900, 999100, 1001700, 999000, 998600, 1001000, 999800, 1001000, 1001000, 998600, 797200, 987000, 999900, 1000300, 999700, 1000700, 999300, 1000100, 1000700, 999400, 1000500, 999500, 999700, 1000500, 999600, 1000500, 1000700, 999800, 1000000, 1000000, 1000800, 999500, 999700, 1000200, 999800, 1000100, 1000800, 1001300, 999300, 999700, 999500, 1000300, 1000100, 1000200, 999900, 999400, 999800, 1000100, 1000000, 1001400, 998700, 999700, 1000100, 999900, 1000700, 1001800, 1000200, 1000500, 999400, 1001000, 999100, 999700, 1000000, 999800, 999800, 1000000, 999900, 1000300, 999800, 1000900, 1000700, 999100, 999900, 999900, 1000700, 999600, 999100, 1002100, 998900, 999200, 1000000, 1000400, 999800, 1000200, 999600, 999900, 999900, 1000100, 1000100, 999500, 1002000, 996600, 999200, 998300, 999300, 1000300, 999500, 1001800, 1000500, 999900, 1002700, 997600]
244 / 1000

And on macOS, where the results do reveal a higher precision:

[198414, 168290, 236372, 222789, 169664, 161749, 157123, 222998, 193248, 165915, 197082, 130540, 134706, 128373, 139332, 129499, 114374, 130832, 125874, 129873, 131540, 134248, 121915, 50541, 48458, 47291, 45833, 47832, 46166, 43625, 43708, 51542, 55374, 49416, 65416, 53416, 48041, 48417, 48791, 46417, 47624, 45957, 46666, 44416, 82082, 55916, 46000, 50165, 53082, 53624, 50999, 55166, 50625, 47082, 56832, 53291, 80707, 51625, 62957, 59832, 62666, 57333, 64291, 53333, 67625, 61874, 54624, 60666, 58749, 55666, 60999, 54249, 64708, 53125, 62749, 61625, 52791, 57874, 53000, 57458, 57541, 53541, 50083, 53916, 54749, 56166, 53958, 53874, 54208, 68374, 59833, 63541, 58832, 52916, 51416, 56791, 54416, 58791, 56333, 52791, 73083, 54666, 54291, 53375, 58750, 59666, 60125, 57874, 73082, 65165, 71749, 67166, 68000, 63458, 66291, 67207, 59416, 61041, 59416, 59999, 64958, 66624, 60957, 61624, 63166, 65166, 55708, 54291, 54083, 60957, 56083, 53166, 58416, 57249, 57041, 60625, 53500, 53499, 58749, 49374, 50250, 57624, 59541, 52291, 54832, 49958, 50499, 57208, 47875, 61832, 60041, 54124, 56874, 56917, 58582, 60374, 52874, 58916, 58000, 58458, 60749, 46583, 49916, 56374, 52957, 52750, 49416, 51750, 49375, 53333, 52457, 52874, 53874, 50541, 48416, 47958, 48375, 51416, 51999, 48458, 51332, 50874, 50958, 48166, 48666, 46583, 46667, 49416, 46083, 47083, 48791, 48207, 57916, 53000, 49124, 47250, 49666, 57916, 47707, 49750, 49291, 47625, 54374, 46167, 52749, 43583, 44291, 53916, 45708, 44500, 45291, 55333, 48458, 43124, 41666, 40833, 49749, 46208, 45916, 44416, 42291, 44916, 44708, 46583, 46416, 42458, 46333, 43500, 43124, 44291, 41958, 46166, 44416, 44208, 47333, 41083, 44541, 41375, 41208, 42917, 48041, 41708, 45625, 44333, 44792, 40167, 42541, 40167, 43374, 46833, 44124, 39916, 42625, 42625, 42749, 41458, 41916, 41208, 48041, 47749, 45500, 46958, 44083, 47416, 42541, 40916, 42375, 38583, 45666, 44583, 47667, 40708, 37749, 38791, 48041, 41917, 41249, 37333, 43291, 42166, 41416, 44124, 40374, 45041, 45208, 37916, 40791, 40458, 38750, 42417, 37125, 38333, 37125, 39417, 39333, 36958, 40417, 41833, 36624, 38416, 36250, 37833, 39291, 36458, 36624, 39416, 39208, 40374, 37541, 36000, 36666, 36708, 37583, 38417, 35666, 36333, 35708, 39291, 36083, 35250, 35916, 35874, 39042, 35333, 35833, 35792, 37541, 43708, 43958, 39499, 36624, 40417, 41125, 37417, 36958, 37042, 41417, 41833, 50166, 40791, 40167, 39041, 38083, 40792, 40624, 38958, 39791, 50708, 41083, 37083, 36333, 40041, 36083, 35708, 36125, 35958, 39500, 36208, 40999, 41292, 41916, 41250, 40249, 41916, 40499, 39416, 40749, 39041, 36708, 36750, 38042, 40208, 39208, 42374, 36958, 41750, 42541, 42333, 45666, 40458, 45750, 42499, 39874, 42292, 50791, 353829, 37417, 37416, 37666, 38333, 41208, 36541, 38083, 37083, 37416, 42166, 39625, 38083, 36583, 38166, 39708, 37666, 37624, 46125, 37916, 42458, 37458, 115457, 40500, 38417, 43625, 43125, 46458, 42041, 167124, 42333, 40166, 37249, 37416, 38833, 41291, 37917, 37750, 37583, 38500, 42249, 38000, 38207, 37666, 37583, 39166, 38291, 41667, 37291, 39500, 38792, 36833, 38708, 36291, 36582, 38124, 37291, 40958, 37333, 39458, 38292, 37083, 42333, 37291, 37124, 36791, 36708, 42208, 36624, 36583, 36875, 36541, 40208, 36375, 35791, 36583, 37125, 43958, 42124, 42667, 39624, 40708, 40749, 37417, 37249, 36666, 36416, 39542, 36624, 35917, 37874, 36666, 40416, 36041, 37458, 36292, 40207, 41749, 37416, 36250, 36624, 39499, 37291, 38916, 41708, 38708, 36333, 38083, 34374, 39375, 37708, 46083, 38332, 38833, 37750, 36791, 40625, 36708, 45916, 39208, 37750, 40416, 38874, 42666, 36791, 42875, 40833, 40499, 39499, 44250, 41708, 41542, 38374, 45041, 36625, 37333, 36875, 37541, 36333, 36999, 36874, 37083, 36875, 38750, 34708, 34250, 37124, 41416, 39416, 37583, 38749, 45708, 34833, 44124, 136332, 36875, 34208, 36666, 41708, 37333, 39249, 36750, 37416, 38999, 38000, 39749, 37708, 41916, 38958, 36875, 39374, 37041, 37124, 39083, 39458, 38291, 37208, 36999, 44125, 36666, 39833, 37000, 41333, 45750, 37291, 39208, 34750, 38833, 40374, 32916, 39667, 36416, 37416, 39750, 32750, 34708, 37250, 35999, 39583, 33375, 39625, 32541, 33166, 34874, 33166, 38375, 34291, 37667, 36833, 37041, 38082, 37291, 36916, 43082, 39458, 37208, 36208, 36499, 42874, 37333, 41666, 41833, 43041, 44791, 43458, 41333, 36374, 36833, 39542, 36083, 36541, 37458, 36124, 38667, 36542, 35667, 35833, 36041, 74916, 39166, 37208, 36291, 37875, 41333, 36916, 37208, 36416, 37333, 40749, 36708, 36208, 36042, 37041, 37124, 37708, 37958, 39000, 40750, 37541, 37125, 36708, 37083, 37791, 36125, 36333, 37208, 37583, 39624, 36541, 35749, 36125, 36333, 35542, 36166, 37666, 36374, 38750, 39083, 36292, 38583, 37582, 37208, 34125, 36791, 44708, 38749, 34000, 38500, 37417, 39916, 33250, 33500, 39000, 36874, 33667, 37583, 36750, 36417, 36041, 37624, 36333, 36250, 38000, 38333, 42166, 36833, 33916, 36708, 39208, 45000, 36208, 36458, 36958, 36625, 38666, 36125, 37583, 38375, 33333, 39166, 36458, 39375, 36458, 37042, 39249, 33833, 39208, 36874, 37083, 37207, 37249, 40250, 33666, 33374, 40042, 44749, 44166, 43708, 36458, 33874, 38582, 40124, 34375, 37625, 37500, 39916, 45624, 40666, 39791, 38999, 35083, 41583, 40624, 42125, 38792, 38041, 37874, 38041, 40166, 37874, 39750, 37375, 33375, 42500, 40541, 45208, 34707, 37249, 41499, 33666, 37000, 36791, 37416, 44250, 33458, 38500, 37167, 33917, 38667, 43083, 45250, 33999, 37457, 32750, 116749, 51500, 38041, 47374, 39208, 38791, 36624, 40208, 37583, 37917, 40291, 31916, 33458, 37666, 42375, 36958, 45791, 36583, 39374, 43874, 54166, 37041, 32374, 40125, 43625, 38958, 31917, 31583, 34125, 34083, 36541, 38875, 37625, 36750, 40083, 57749, 39708, 37125, 40333, 37583, 40875, 43208, 42167, 49166, 39958, 41666, 54541, 38624, 47417, 43500, 38208, 41250, 39208, 52833, 42207, 34875, 44500, 36958, 49708, 44208, 41291, 44541, 38833, 42958, 41458, 35208, 36125, 60708, 43166, 34458, 49124, 38082, 34124, 37249, 34833, 38041, 44083, 48124, 41249, 37250, 37041, 47333, 64041, 36583, 49583, 34083, 39292, 36208, 37125, 38542, 45166, 38291, 40916, 37333, 37624, 35375, 39416, 36541, 37167, 36875, 37083, 42833, 35083, 36166, 36125, 36625, 39749, 39541, 36292, 45374, 37958, 39249, 39708, 45124, 517453, 56000, 46624, 41000, 43083, 41708, 44000, 37291, 38458, 40332, 40541, 40874, 38374, 321371, 44499, 38541, 44125, 38792, 40000, 42083, 37041, 39041, 44333, 52958, 41042, 37249, 38291, 39708, 40625, 38583, 37624, 42249, 43208, 45416, 40458, 31250, 38208, 35957, 41458, 39167, 41708, 36875, 47958, 44208, 120915, 36625, 44958, 39499, 39083, 47000, 44875, 47667, 37166, 46874, 36958, 42500, 38624, 44000, 37458, 33291, 39457, 46541, 38375, 33458, 42750, 60166, 49041, 45583, 46833, 49166, 41374, 44750, 47958, 36249, 38832, 191873, 42624, 40582, 51291, 38458, 50166, 45082, 42375, 40374, 37916, 42792, 40375, 45083, 42583, 37125, 47958, 34000, 42667, 47583, 40749, 38958, 33500, 38208, 50041, 49750, 44791, 39083, 38583, 52208, 40458, 32999, 49624, 32041, 48332, 141874, 50792, 44042, 39291, 38041, 46750, 43166, 41875]
1000 / 1000

For my thoughts related more specifically to the code changes in this PR, please see my comment below.

Comment on lines -53 to +56
Ok(first_time != second_time)
Ok(second_time.duration_since(first_time).is_ok_and(|d|
// This can be falsely false if a filesystem would be ridiculously fast,
// which means a test won't run even though it could. But that's OK, and unlikely.
d.subsec_nanos() != 0))
Copy link
Member

@EliahKagan EliahKagan Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems intended to do something different from first_time != second_time, but it effectively behaves the same. The subsec_nanos method is effectively just taking the duration modulo 1 second and expressing the value in nanoseconds. In nearly all cases of unequal timestamps that arise from real-world processes, a difference between timestamps will give a nonzero value of subsec_nanos(), except possibly on a system where subsecond_nanos() always returns 0 due to 1 second precision. As evaluated here, d.subsec_nanos() != 0 is in practice true under almost exactly the circumstances that first_time != second_time was true.

(There are some differences, but they are much rarer than the failures we have observed. If the clock were adjusted such that the timestamps cannot be substracted to give a nonnegative value, or if the difference is nonzero but the time between the two file creation/write operations was curiously long and also an exact multiple of 1 second, then this would evaluate to false. It seems unlikely that those scenarios ever happen when running this on CI. If they do, they would happen far less often than the failures we had before as reported in #1896. Thinking roughly about it: these would be much rarer than network problems causing checkout failure, which I believe could arise due to less extreme timing oddities and which we observe on rare occasion, but the failures are instead much more common than that.)

Although the current failures happen on systems where this is wrongly true, it's worth noting that the tests usually pass on those systems, which makes me wonder if the journey test can be adjusted to make sure a sufficient delay occurs and/or is recorded even on systems with millisecond precision. Furthermore, the code comment here notes that it could be wrongly false even with high precision, on a sufficiently fast filesystem. If so, then if I understand the journey test correctly, it could also produce writes so close together that they have the same timestamp. So if the journey test can be adjusted to work reliably regardless of filesystem precision (so long as we do actually have working modification times, that is), then maybe that would benefit all systems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the analysis! It looks like I am on the 'worst' platform to actually get this right, along with my misperception of subsecond-precision always meaning nanosecond support.

Then it looks like the only way to stabilise this is to see if there is subsecond precision or not, assume millisecond precision if it is, and use that to choose a sleep-duration that will make the writes work.

Beyond that, the test also shows the limits of this snapshot system which is likely to not pick up changes if they happen in short succession.

Copy link
Member

@EliahKagan EliahKagan Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another idea. I think my idea may well be worse than what you have just described, but I'm just finishing up testing it. So I'll open a PR for it so you can take a look.

Edit: I've opened #1899.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FileSnapshot::into_owned_or_cloned intermittently fails according to its journey test
2 participants