Skip to content

spurious network errors while running CI for Solaris on vmactions/solaris-vm #15521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
psumbera opened this issue May 13, 2025 · 10 comments · May be fixed by #15532
Open

spurious network errors while running CI for Solaris on vmactions/solaris-vm #15521

psumbera opened this issue May 13, 2025 · 10 comments · May be fixed by #15532
Labels
A-networking Area: networking issues, curl, etc. C-bug Category: bug O-solaris OS: Oracle Solaris S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request.

Comments

@psumbera
Copy link
Contributor

psumbera commented May 13, 2025

Problem

We have started to see following network errors:

  warning: spurious network error (3 tries remaining): [28] Timeout was reached (Operation too slow. Less than 10 bytes/sec transferred the last 30 seconds)

as reported here: quinn-rs/quinn#2218

Note that it's running Solaris under QEMU in Linux. I guess that there might have been some changes to QEMU and/or Linux underneeth which enabled these problems. Unfortunatelly main victim seems to be Solaris.

I have come with following workaround. But the cost seems to be longer times...

quinn-rs/quinn#2243

@psumbera psumbera added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels May 13, 2025
@weihanglo
Copy link
Member

Hmm… so 1.86.0 has caused some issues? We did bump curl to from 8.9.0 to 8.12.0 in Rust 1.86: #15162. I wonder if it is possible for you to downgrade to old curl and see what is going on.

This environment could also be set to show more useful debugging info:

CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo fetch

@weihanglo weihanglo added A-networking Area: networking issues, curl, etc. S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. O-solaris OS: Oracle Solaris and removed S-triage Status: This issue is waiting on initial triage. labels May 13, 2025
@psumbera
Copy link
Contributor Author

Hmm… so 1.86.0 has caused some issues? We did bump curl to from 8.9.0 to 8.12.0 in Rust 1.86: #15162. I wonder if it is possible for you to downgrade to old curl and see what is going on.

I don't think it's related to cargo version. I have tried version 1.82 and it had the same problems. But originally when Solaris CI was integrated it used this version (1.82) for some time without issue.

@psumbera
Copy link
Contributor Author

Workaround seems to be to set CARGO_HTTP_MULTIPLEXING=false.

@weihanglo
Copy link
Member

weihanglo commented May 15, 2025

Workaround seems to be to set CARGO_HTTP_MULTIPLEXING=false.

Cargo already has some built-in workaround for disabling multiplexing under some circumstances. Like this and this. However, due to the unusual interaction (Solaris under QEMU), not sure if we'd like to have a similar approach built-in before knowing the root cause.

@psumbera
Copy link
Contributor Author

This environment could also be set to show more useful debugging info:

CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo fetch

I don't see any debug info with this:

CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo fetch
  cargo build --locked --all-targets && cargo test --locked --manifest-path fuzz/Cargo.toml && cargo test --locked -p quinn-udp --benches
  /usr/bin/bash /home/runner/work/_actions/vmactions/solaris-vm/v1/run.sh execSSHSH
  Config file: solaris-11.4-gcc.conf
      Updating crates.io index
   Downloading crates ...
    Downloaded windows-targets v0.52.6
    Downloaded untrusted v0.9.0
    Downloaded subtle v2.6.1
    Downloaded write16 v1.0.0
    Downloaded icu_provider_macros v1.5.0
  warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 229231 bytes remaining to read)
  warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 29962 bytes remaining to read)
  warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 47660 bytes remaining to read)
  warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 52722 bytes remaining to read)
  warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 16971 bytes remaining to read)
  warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 7525 bytes remaining to read)
  warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 29673 bytes remaining to read)
  warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 18327 bytes remaining to read)
  warning: spurious network error (3 tries remaining): [28] Timeout was reached (download of `zeroize v1.8.1` failed to transfer more than 1 bytes in 90s)
  warning: spurious network error (3 tries remaining): [28] Timeout was reached (failed to download any data for `rustls v0.23.27` within 90s)

@psumbera
Copy link
Contributor Author

Ok I now see it also on my VirtualBox. So I don't think it's QEMU related.

test@virt75:~/quinn-ci-test2$ CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo fetch
    Updating crates.io index
  Downloaded wasm-bindgen-macro v0.2.100
  Downloaded subtle v2.6.1
  Downloaded unicode-ident v1.0.18
  Downloaded bumpalo v3.17.0
  Downloaded wasm-bindgen-backend v0.2.100
  Downloaded rustls-webpki v0.103.3
  Downloaded synstructure v0.13.1
  Downloaded zeroize v1.8.1
  Downloaded wasm-bindgen-macro-support v0.2.100
  Downloaded pin-project v1.1.10
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 8869 bytes remaining to read)
  Downloaded matchers v0.1.0
  Downloaded web-time v1.1.0
  Downloaded js-sys v0.3.77
  Downloaded rustversion v1.0.20
  Downloaded quote v1.0.40
  Downloaded value-bag v1.11.1
  Downloaded signal-hook-registry v1.4.2
  Downloaded rustls-pki-types v1.11.0
  Downloaded windows-targets v0.52.6
  Downloaded rustls v0.23.27
  Downloaded untrusted v0.9.0
  Downloaded wasm-bindgen v0.2.100
  Downloaded thiserror-impl v2.0.12
  Downloaded adler2 v2.0.0
  Downloaded tracing-futures v0.2.5
warning: spurious network error (3 tries remaining): [16] Error in the HTTP2 framing layer
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 27991 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 16310 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 58239 bytes remaining to read)
  Downloaded cfg-if v1.0.0
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 49252 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 34446 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 10223 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 16970 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 42691 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 22654 bytes remaining to read)
  Downloaded wasm-bindgen-shared v0.2.100
  Downloaded shlex v1.3.0
error: failed to verify the checksum of `shlex v1.3.0`

Maybe it can be related to old CURL 7.79.0 (Old Solaris CBE release 11.4.42). While on baremetal with latest Solaris system (CURL 8.12.0) I don't see it...

@weihanglo
Copy link
Member

The aforementioned patch looks like this:

/// Disable HTTP/2 multiplexing for some broken versions of libcurl.
///
/// In certain versions of libcurl when proxy is in use with HTTP/2
/// multiplexing, connections will continue stacking up. This was
/// fixed in libcurl 8.0.0 in curl/curl@821f6e2a89de8aec1c7da3c0f381b92b2b801efc
///
/// However, Cargo can still link against old system libcurl if it is from a
/// custom built one or on macOS. For those cases, multiplexing needs to be
/// disabled when those versions are detected.
fn disables_multiplexing_for_bad_curl(
curl_version: &str,
http: &mut CargoHttpConfig,
gctx: &GlobalContext,
) {
use crate::util::network;
if network::proxy::http_proxy_exists(http, gctx) && http.multiplexing.is_none() {
let bad_curl_versions = ["7.87.0", "7.88.0", "7.88.1"];
if bad_curl_versions
.iter()
.any(|v| curl_version.starts_with(v))
{
tracing::info!("disabling multiplexing with proxy, curl version is {curl_version}");
http.multiplexing = Some(false);
}
}
}

We could do stuff once we know the pattern of the failure.
(sorry I don't have those machines so can only rely on people to test it 😞)

@psumbera
Copy link
Contributor Author

Ok, this can be fixed also via:

--- rustc-1.86.0-src/src/tools/cargo/src/cargo/util/context/mod.rs
+++ rustc-1.86.0-src/src/tools/cargo/src/cargo/util/context/mod.rs
@@ -3020,7 +3020,7 @@
     use crate::util::network;

     if network::proxy::http_proxy_exists(http, gctx) && http.multiplexing.is_none() {
-        let bad_curl_versions = ["7.87.0", "7.88.0", "7.88.1"];
+        let bad_curl_versions = ["7.79.0", "7.87.0", "7.88.0", "7.88.1"];
         if bad_curl_versions
             .iter()
             .any(|v| curl_version.starts_with(v))

Note that CURL is 7.79.0 is bundled with older version of Oracle Solaris CBE 11.4.42 which is still in use with Github vmactions/solaris-vm.

psumbera added a commit to psumbera/cargo that referenced this issue May 16, 2025
CURL 7.79.0 is bundled with Oracle Solaris 11.4.42 CBE release which is still used vi Github vmactions/solaris-vm.

Fixes: rust-lang#15521
@psumbera psumbera linked a pull request May 16, 2025 that will close this issue
psumbera added a commit to psumbera/cargo that referenced this issue May 16, 2025
CURL 7.79.0 is bundled with Oracle Solaris 11.4.42 CBE release which is still used vi Github vmactions/solaris-vm.

Fixes: rust-lang#15521
@weihanglo
Copy link
Member

This environment could also be set to show more useful debugging info:

CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo fetch

I don't see any debug info with this:

Sorry should be something lke

CARGO_HTTP_DEBUG=true CARGO_LOG=network=debug cargo fetch

@psumbera
Copy link
Contributor Author

Sorry should be something lke

CARGO_HTTP_DEBUG=true CARGO_LOG=network=debug cargo fetch

Thank you! Now there is too much data...

fetch.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-networking Area: networking issues, curl, etc. C-bug Category: bug O-solaris OS: Oracle Solaris S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants