-
Notifications
You must be signed in to change notification settings - Fork 13.3k
heisenbug: debug builds Integer::repr_discr
has transient ICE on compile-fail/enum-discrim-too-small2.rs
#47381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
(It seems easy enough to add another |
Hmm. Apparently we have |
(I am now suspecting that there is a |
FYI the test in question is https://github.com/rust-lang/rust/blob/master/src/test/run-pass/discrim-explicit-23030.rs I'm double checking what's going on in the layout code now, to see if its something obviously wrong with the these explicitly assigned values... |
Update: Something might have been terribly wrong with my builds. Things went south in other very strange ways while I was working on this. |
Yes, as far as I can tell, at least some of the problems I was observing seem to have gone away after Closing this ticket. |
(while I am not yet willing to reopen this ticket, I am seeing this same scenario arise on two different computers, which makes me think that while it may be some transient issue, its something we still may need to look more carefully into. But whatever's going on, the fact that it does not consistently reproduce is sign of a deeper problem than what this bug description indicates...) Update: to be clear: It consistently reproduces for some compiler builds. That is, I can re-run the test over and over and see it. The problem is that small unrelated changes to the |
... just saw it arise again (on a build that I am pretty sure was working a little while ago...). For the record, on my Linux box, this is the command line I am invoking for each build:
I am becoming tempted to start digging into why this keeps cropping up for me... |
For the record, I am seeing cases where (presumably inlined) calls to Integer::fit_signed are returning the wrong value for edge cases. That is, after instrumenting the panic message with more data, I get a message that And that should not be happening. (That is, it is presumably some sort of code generation bug.) |
Integer::repr_discr
ICE on compile-fail/enum-discrim-too-small2.rs
Integer::repr_discr
has transient ICE on compile-fail/enum-discrim-too-small2.rs
Reopening ticket. I'm seeing this on a build on my mac (it can crop up on either linux or mac) so I'm going to spend some time digging into the generated machine code and try to get to the bottom of what's going on. |
Okay so here's what I've found after much investigation (and learning how to get Visual Studio Code to do various breakpoint machinations): Stepping through the instructions, it seems like the essential problem is arising from a miscompiled instruction sequence corresponding to an inlined call to /// Find the smallest Integer type which can represent the signed value.
pub fn fit_signed(x: i128) -> Integer {
match x {
-0x0000_0000_0000_0080...0x0000_0000_0000_007f => I8,
-0x0000_0000_0000_8000...0x0000_0000_0000_7fff => I16,
-0x0000_0000_8000_0000...0x0000_0000_7fff_ffff => I32,
-0x8000_0000_0000_0000...0x7fff_ffff_ffff_ffff => I64,
_ => I128
}
} The ( (executed instructions annotated with values in relevant registers as each is run)
The problem is that this is ending up with the value The assembly code that I see via passing Portion of `--emit=asm` outputLBB99_376:
.loc 13 467 0
movq %r13, %rax
addq $128, %rax
movq %r14, %rcx
adcq $0, %rcx
cmpq $256, %rax
sbbq $0, %rcx
jae LBB99_383
Ltmp3275:
.loc 13 0 0 is_stmt 0
xorl %eax, %eax
jmp LBB99_386
LBB99_378:
Ltmp3276:
.loc 13 468 0 is_stmt 1
movq %r13, %rcx
addq $32768, %rcx
movq %r14, %rdx
adcq $0, %rdx
movb $1, %al
cmpq $65536, %rcx
sbbq $0, %rdx
jb LBB99_381
.loc 13 469 0
movl $2147483648, %ecx
addq %r13, %rcx
movq %r14, %rdx
adcq $0, %rdx
shrdq $32, %rdx, %rcx
shrq $32, %rdx
movb $2, %al
orq %rcx, %rdx
je LBB99_381
.loc 13 470 0
movabsq $-9223372036854775808, %rax
addq %rax, %r13
movq %r14, %rax
adcq $0, %rax
setne %al
addb $3, %al
Ltmp3277:
LBB99_381:
.loc 13 467 0
movq %r15, %rcx
addq $128, %rcx
movq %rdi, %rdx
adcq $0, %rdx
cmpq $256, %rcx
movl %esi, %r13d
sbbq $0, %rdx
jae LBB99_388
Ltmp3278:
.loc 13 0 0 is_stmt 0
xorl %ecx, %ecx
jmp LBB99_391
LBB99_383:
Ltmp3279:
.loc 13 468 0 is_stmt 1
movq %r13, %rcx
addq $32768, %rcx
movq %r14, %rdx
adcq $0, %rdx
movb $1, %al
cmpq $65536, %rcx
sbbq $0, %rdx
jb LBB99_386
.loc 13 469 0
movl $2147483648, %ecx
addq %r13, %rcx
movq %r14, %rdx
adcq $0, %rdx
shrdq $32, %rdx, %rcx
shrq $32, %rdx
movb $2, %al
orq %rcx, %rdx
je LBB99_386
.loc 13 470 0
movabsq $-9223372036854775808, %rax
addq %rax, %r13
adcq $0, %r14
setne %al
addb $3, %al
Note in particular that the dumped assembly includes (I think) comparisons against the large constant In short, my theory is that, for some reason, LLVM is optimizing away the code generated for this match arm on the -0x8000_0000_0000_0000...0x7fff_ffff_ffff_ffff => I64, but I believe this mal-optimization is occurring during LTO. Its not clear to me whether its happening during ThinLTO or FatLTO. The stage0 binary that is used for this build of the stage1
which I believe would be a build that predates #47548 but comes after #46382 ; so it is at least conceivable that this bug (which is unfortunately quite hard to reproduce/minimize) is a problem in ThinLTO alone... |
This bug is especially scary to me given that we are currently making policy decisions about whether to turn ThinLTO on or off based solely on the compile-time/codesize impact, without any discussion of risk related to its correctness? Of course I understand that compiler bugs are a fact of life (including in LLVM), but it might be better to have a uniform policy everywhere, so that when a bug does arise, it actually has pressure to get fixed? That is, if we're going to turn ThinLTO off in one context based on codesize/compile-time issues, then I might argue we should have it off everywhere just to keep bug investigation under control... |
Nominating for discussion in the @rust-lang/compiler meeting. @pnkfelix can share what the heck is going on here. |
Unfortunately I'll be on PTO during the meeting tomorrow. I'm not sure what I have to share beyond the information I attempted to pack into my earlier comment. (I'm also a little shocked that I seem to be the only one who is running into this on any sort of consistent basis, on both Linux and Mac. But that could well be because I was working for a couple of weeks against a baseline point on master that was still using ThinLTO by default, which may or may not be a key catalyst for reproducing this bug...) |
(Indeed, it seems like updating to a newer master has made my problem go away again...) |
I am seeing this problem too =) and I believe @spastorino was too. I'm going to try rebasing, but it's definitely worrisome. @eddyb -- were you seeing this, or something else? |
I don't think I've seen this one, no. |
@nikomatsakis unsure if it's exactly the same thing or just related in some way. In my case 4 tests are consistently failing on my local machine. You can check my log here https://gist.github.com/anonymous/f452914ff6e474c465ca02d28e628a07 |
@spastorino I wouldn't be surprised, given the nature of the machine code I was seeing, if all of those have the same root cause. (Which, the current theory goes, is due to ThinLTO mis-optimizing handling of 128-bit values...) |
I encounter this from time to time. Rebasing has seemed to help though. |
I'm seeing something similar with |
should we re-open this? |
@nikomatsakis 👍 or open new issues 🙂 |
I just saw this once again with the
(*) An earlier version of this comment erroneously claimed the issue was arising during bootstrap of the compiler itself. That was wrong. And this time its on a system with a working copy of |
I've been seeing this on https://github.com/Zoxc/rust/tree/run-fast-ice with |
The bug seems to have manifested itself fairly frequently when it was present. Now it is 5 years ago someone reported they encountered it. A generous (and IMHO reasonable) interpretation of that is that the heisenbug is gone and that this I-ICE issue can be closed. |
Triage: No objections to the close proposal for two months. Closing! |
My config.toml:
And my (odd?) build invocation:
Results in stage 1 binary where we get weirdness on this compile-fail test:
I am pretty sure this is specific to debug builds. At least, that's my current best guess as to why this hasn't been caught by bors.
The code
layout.rs
looks like this:it should probably be signalling a proper error, or trusting that the lint will eventually fire and catch this, rather than just triggering a call to
bug!
(since this does not represent a bug in the compiler itself, right?)The text was updated successfully, but these errors were encountered: