-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Segmentation fault on nightly and beta when running with --release #49010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am also experiencing this in https://github.com/termhn/nano-rs on |
This fixes it tokio-rs/tokio-timer#40 ... not sure what that means or if it is helpful (some optimization to do with continue breaking things maybe?) |
cc @rust-lang/compiler |
Do we know if this is a regression? Can one of you try to reproduce on older nightly builds and see if you can find some point that worked? |
My guess would be that it is some kind of optimization gone wrong in LLVM, but whether that's LLVM's fault or ours is sort of unclear. |
I'd test that but pittily I could not find out how to revert to older nightly/beta versions using rustup :/ |
You can use |
I checked older nightly versions and the segfault appears first when using |
@joeschman Thanks! Btw, jfyi there is a new shiny tool by @Mark-Simulacrum that can do the bisection for you: https://github.com/rust-lang-nursery/cargo-bisect-rustc It will even find the problem down to the PR level if it arose in the last 90 days. In any case, these are the diffs between those two nightlies, if I'm not mistaken: |
One thing jumps out at me: Upgrade to LLVM 6 =) |
Indeed. Definitely seems like the kind of bug a new LLVM version could introduce. |
cc @rust-lang/compiler -- anybody want to try to track down a potential LLVM bug introduced by LLVM 6 transition? @nagisa seems like your speciality :) |
Okay, I’ll try to take a look at this tomorrow. |
Ugh, I cannot reproduce the fault on Linux, so it is very hard for me to even start investigating this more seriously. The supposed error location is within a huge inlined 1.5k line blob of IR and the function with and without the fix at tokio-rs/tokio-timer#40 appears to be optimised down into identical code, at least for that specific function. |
Anything I could provide from the windows side that would help with this—artifacts from building etc? I’m not super knowledgeable about everything that happens but willing to help if I can :)
… On Mar 21, 2018, at 11:21 AM, Simonas Kazlauskas ***@***.***> wrote:
Ugh, I cannot reproduce the fault on Linux, so it is very hard for me to even start investigating this more seriously.
The supposed error location is within a huge inlined 1.5k line blob of IR and the function with and without the fix at tokio-rs/tokio-timer#40 <tokio-rs/tokio-timer#40> appears to be optimised down into identical code, at least for that specific function.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#49010 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA2Lny1pOIsY8d03rItFM7eY0JmiAkZVks5tgpoggaJpZM4SqHsX>.
|
Can you see if the issue is reproducible on windows with the minimal sample provided in the first comment? |
If there‘s any more information i can provide for the problem ln macos, just tell me what ;) |
Wrote this comment yesterday but forgot to hit submit. So here’s the thing that sticks out as a sore thumb to me in the optimised IR as far as differences between the "bad" and "fixed" versions goes: %min.0.ph32.i.i.i = phi i64* [ null, %bb8.lr.ph.i.lr.ph.lr.ph.i.i.i ], [ %.lcssa23.i164.pre-phi.i.pre-phi.i.pre-phi, %bb11.i165.i.i ]
; ...
%328 = icmp ne i64* %min.0.ph32.i.i.i, null
; a while later
call void @llvm.assume(i1 %328) #13, !noalias !213 I still haven’t checked the IR closely enough to confirm whether the CFG allows for this assume to ever called on falsy value, but this is really the only notable difference between the fixed and non-fixed version anyway. Surprisingly, this |
Things that would be helpful and make it easier to debug this:
A minimal reproducer that works on windows would be better as I have an access to a windows VM, though perhaps it is as good a time as ever to figure out how to make a mac VM… hmm. |
Assigning to nagisa, but it seems like they are stuck without some way to reproduce. |
Assigning to @pnkfelix to try and reproduce on mac |
I now minimized the example. All the code is now in the repository without dependencies to crates that are outside the repository. Pittily a one-file reproducer was not possible. It seems that the bug only occurs when the code is called from another crate. I minimized the size of that crate to 20 lines of code. When the same code is run from an internal module, no segfault occurs. Furthermore I found out that with yesterday's nightly build (
Any recommendations for a debugger for rust on a mac? I have hardly any experience with using debuggers at all. |
Alright so I was testing on my Windows machine and confirmed the same behavior as @joeschman has on mac. I also pinpointed that it is fixed in |
I have reproduced the problem on my mac, using the joeschman/tokio-timer-segfault repo and the |
Here is a gist https://gist.github.com/pnkfelix/fdab1b374d49e8850073a357d4f492f4 with the things that @nagisa had asked for (a stack backtrace, dump of the register file, and a disassembly of the function). |
@joeschman by the way, the way I got the info in the above gist is I just ran the binary under
and then when I got the debugger prompt (
and then after the program exception occurred, I used the following commands (included in the gist output above)
|
@nagisa pointed out that I was using an old head of the source repo. I updated and re-ran the same |
Experimentation and Discussion with @nagisa and @alexcrichton has led us to the hypothesis that this is a bug injected by thinlto, which is on by default for |
|
I doubt these are the culprits, but skimming the commit log shows @alexcrichton's "rustc: Tweak funclet cleanups of ffi functions", which seems ... at least in the neighborhood. |
Funclets would only affect certain windows targets and nothing else.
…On Thu, Apr 12, 2018, 17:18 Niko Matsakis ***@***.***> wrote:
I doubt these are the culprits, but skimming the commit log shows
@alexcrichton <https://github.com/alexcrichton>'s "rustc: Tweak funclet
cleanups of ffi functions"
<804666f>,
which seems ... at least in the neighborhood.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#49010 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AApc0seepHT1rsJVRbxA-7sCUJd5idBZks5tn2IsgaJpZM4SqHsX>
.
|
We may need to appoint someone to be the official "LTO debugging" expert based on the number of bugs we currently have that have sort of stalled at the point of determining "oh, this appears to be injected by [Thin]LTO." |
triage: P-medium Next steps are to diagnose the LLVM problem. Filing under #50422. |
Triage: looks like the energy to reproduce this one kinda petered out. Can anyone reproduce this today? |
One year later, with no reproduction instructions, I'm going to give this one a close. If anyone can make a reproduction, please let me know and we can re-open, thanks! |
I encountered a segmentation fault when running a program on nightly or beta in release mode.
The segfault occurs somewhere in another crate but within safe code and only if
--release
is passed to cargo.I created a crate which is minimal setup to encounter this error:
https://github.com/joeschman/tokio-timer-segfault
A more detailed description can be found in the readme.
The text was updated successfully, but these errors were encountered: