Skip to content

rustc build core dump on OpenBSD #31363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
semarie opened this issue Feb 2, 2016 · 16 comments
Closed

rustc build core dump on OpenBSD #31363

semarie opened this issue Feb 2, 2016 · 16 comments
Labels
O-openbsd Operating system: OpenBSD

Comments

@semarie
Copy link
Contributor

semarie commented Feb 2, 2016

I am investigating a core dump issue (rustc compiler crash without panic) under OpenBSD during the build. The segfault occurs at stage2 during libcore building.

It seems to occurs occasionnally on OpenBSD buildhost.

Note that rustc under OpenBSD use system malloc(3), that comes with several security options (see http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man5/malloc.conf.5).

Backtrace of one build:

(gdb) bt
#0  0x00000f38e4e53170 in str::from_utf8::h677e0dbe19ab7a5adYR ()
   from /data/semarie/obj/rust/x86_64-unknown-openbsd/stage2/lib/libstd-db5a760f.so
#1  0x00000f38e4de62a1 in ffi::os_str::OsString::into_string::h08ad47b28c773394gSe ()
   from /data/semarie/obj/rust/x86_64-unknown-openbsd/stage2/lib/libstd-db5a760f.so
#2  0x00000f38e4de8498 in env::Args.Iterator::next::he30da4a3586351acwle ()
   from /data/semarie/obj/rust/x86_64-unknown-openbsd/stage2/lib/libstd-db5a760f.so
#3  0x00000f38e27c5c7f in main::he8593f4eb18ffee8dMd ()
   from /data/semarie/obj/rust/x86_64-unknown-openbsd/stage2/lib/librustc_driver-db5a760f.so
#4  0x00000f38e4e1ac0e in sys_common::unwind::try::try_fn::h17969813569916126057 ()
   from /data/semarie/obj/rust/x86_64-unknown-openbsd/stage2/lib/libstd-db5a760f.so
#5  0x00000f38e4e122dc in __rust_try ()
   from /data/semarie/obj/rust/x86_64-unknown-openbsd/stage2/lib/libstd-db5a760f.so
#6  0x00000f38e4e0a046 in sys_common::unwind::inner_try::h5984569719e85a59C7s ()
   from /data/semarie/obj/rust/x86_64-unknown-openbsd/stage2/lib/libstd-db5a760f.so
#7  0x00000f38e4e1a724 in rt::lang_start::h7b315fa4cd5a7fa9Kiy ()
   from /data/semarie/obj/rust/x86_64-unknown-openbsd/stage2/lib/libstd-db5a760f.so
#8  0x00000f363ed00aa1 in _start () from /data/semarie/core/stage2/bin/rustc
#9  0x0000000000000000 in ?? ()
@semarie
Copy link
Contributor Author

semarie commented Feb 2, 2016

@dhuseby Bitrig is relatively similar to OpenBSD. Do you have any problems with building recent tree of rustc under Bitrig ? If no, could you try to build one using export MALLOC_OPTIONS=S in your environment ?

@steveklabnik steveklabnik added the O-openbsd Operating system: OpenBSD label Feb 2, 2016
@dhuseby
Copy link

dhuseby commented Feb 2, 2016

@semarie I do sometimes get segfaults building rustc on Bitrig. I'll give the MALLOC_OPTIONS=S a shot. Thanks for the suggestion.

@nbaksalyar
Copy link
Contributor

Hm, that's interesting.
I stumbled upon the same issue with str::from_utf8 segfault on Illumos (with the same backtrace), but I thought that the issue is related only to that platform.

@semarie, could you please try to check it with optimization turned off? You can do that by adding CFG_DISABLE_OPTIMIZE to your config.mk. It helped in my case.

Also, could you please provide an output from gdb's disassemble command?
I have a suspicion that it's somehow related to PR #30740 and the generated machine code is incorrect, because here's the part that segfaults in my case (it relates to run_utf8_validation function, I guess):

0x00000000004a533c <+44>:    lea    0x0(%rsi,%riz,1),%esi
0x00000000004a5340 <+48>:    movzbl (%rsi,%rcx,1),%r15d

For some reason it removes the most significant bits of $rsi (i.e. making 0x0000000089abcdef out of 0x0123456789abcdef). I haven't found why yet, but that would be very helpful to compare it with OpenBSD.

@semarie
Copy link
Contributor Author

semarie commented Feb 3, 2016

@nbaksalyar here the disassemble (for a failure at 0x00000a4f69b44fd0)

0x00000a4f69b44fcc <_ZN3str9from_utf820hf64d73b10248ef4bdYRE+44>:       lea    0x0(%rsi),%esi
0x00000a4f69b44fd0 <_ZN3str9from_utf820hf64d73b10248ef4bdYRE+48>:       movzbl (%rsi,%rcx,1),%r15d

I will test to build with CFG_DISABLE_OPTIMIZE. Thanks for the possible workaround.

@nbaksalyar
Copy link
Contributor

@semarie, thanks! Looks like it's the same problem - lea nullifies a part of the %esi register, so the segfault is a natural consequence.
Perhaps this issue is present on other BSDs as well, so it's worth to try finding the root cause.
I'll try bisect builds with and without that PR I linked.

@semarie
Copy link
Contributor Author

semarie commented Feb 3, 2016

Due to backtrace I have take a look at run_utf8_validation too. I also tried to fuzzing it (copying all related code in test.rs file and looping on random values), but without success to expose bad behaviour.

Another possible cause is the update of LLVM (#30448) that touched to unwinding (the backtrace has part with sys_commun::unwind)

@alexcrichton does this problem say something to you ?

@alexcrichton
Copy link
Member

In theory that PR shouldn't have tampered with unwinding for Unix platforms at all (it should be the same as it was before), but the segfault does sound... bad?

@semarie
Copy link
Contributor Author

semarie commented Feb 4, 2016

I have done several tests (but I will continue in this direction to ensure) around the LLVM update:

  • building rustc (based on ef1a13b) with the 3 commits reverted
  • building rustc (based on 54664f1) with explicit --llvm-root pointing to previous LLVM version (src/llvm submodule cde1ed3196ba9b39bcf028e06e08a8722113a5cb).

In all cases I don't expose the core dump issue again. I will continue to test in this direction as I am unsure that the problem is systematic.

The fact that I could build and run with just another LLVM version (without reverting the commits) makes me think it could be a LLVM bug ? I dunno for now.

@semarie
Copy link
Contributor Author

semarie commented Feb 4, 2016

I have done against several builds with the previous src/llvm submodule (3564439), and I don't get any problem. My builds passes the testsuite too.

Some codegen comparaison:

  • build with older LLVM : stage1/lib/rustlib/x86_64-unknown-openbsd/lib/libstd-db5a760f.so
    (this file was build using stage1/bin/rustc so using the old LLVM libraries)
00000000000f8e50 <str::from_utf8::hf64d73b10248ef4bdYR>:
   f8e50:       55                      push   %rbp
   f8e51:       41 57                   push   %r15
   f8e53:       41 56                   push   %r14
   f8e55:       53                      push   %rbx
   f8e56:       48 85 d2                test   %rdx,%rdx
   f8e59:       0f 84 65 02 00 00       je     f90c4 <str::from_utf8::hf64d73b10248ef4bdYR+0x274>
   f8e5f:       4c 8d 5a f0             lea    0xfffffffffffffff0(%rdx),%r11
   f8e63:       31 c9                   xor    %ecx,%ecx
   f8e65:       4c 8d 0d 2d bb 12 00    lea    1227565(%rip),%r9        # 224999 <str::UTF8_CHAR_WIDTH::hb3baaadb65d72bbdqNS>
   f8e6c:       41 b8 01 c0 00 00       mov    $0xc001,%r8d
   f8e72:       49 ba 80 80 80 80 80    mov    $0x8080808080808080,%r10
   f8e79:       80 80 80 
   f8e7c:       90                      nop    
   f8e7d:       90                      nop    
   f8e7e:       90                      nop    
   f8e7f:       90                      nop    
   f8e80:       44 0f b6 3c 0e          movzbl (%rsi,%rcx,1),%r15d
   f8e85:       45 84 ff                test   %r15b,%r15b
   f8e88:       78 16                   js     f8ea0 <str::from_utf8::hf64d73b10248ef4bdYR+0x50>
   f8e8a:       8d 04 31                lea    (%rcx,%rsi,1),%eax
   f8e8d:       a8 07                   test   $0x7,%al
   f8e8f:       0f 84 1b 01 00 00       je     f8fb0 <str::from_utf8::hf64d73b10248ef4bdYR+0x160>
...
  • build with newer LLVM : stage1/lib/rustlib/x86_64-unknown-openbsd/lib/libstd-db5a760f.so
    (this file was build using stage1/bin/rustc so using the new LLVM libraries)
00000000000f95e0 <str::from_utf8::hdcda68873f01cd68dYR>:
   f95e0:       55                      push   %rbp
   f95e1:       41 57                   push   %r15
   f95e3:       41 56                   push   %r14
   f95e5:       53                      push   %rbx
   f95e6:       48 85 d2                test   %rdx,%rdx
   f95e9:       0f 84 60 02 00 00       je     f984f <str::from_utf8::hdcda68873f01cd68dYR+0x26f>
   f95ef:       4c 8d 5a f0             lea    0xfffffffffffffff0(%rdx),%r11
   f95f3:       31 c9                   xor    %ecx,%ecx
   f95f5:       4c 8d 0d 11 c3 12 00    lea    1229585(%rip),%r9        # 22590d <str::UTF8_CHAR_WIDTH::he77aff40c91d8da6qNS>
   f95fc:       41 b8 01 c0 00 00       mov    $0xc001,%r8d
   f9602:       49 ba 80 80 80 80 80    mov    $0x8080808080808080,%r10
   f9609:       80 80 80 
   f960c:       8d 74 26 00             lea    0x0(%rsi),%esi
   f9610:       44 0f b6 3c 0e          movzbl (%rsi,%rcx,1),%r15d
   f9615:       45 84 ff                test   %r15b,%r15b
   f9618:       78 16                   js     f9630 <str::from_utf8::hdcda68873f01cd68dYR+0x50>
   f961a:       8d 04 31                lea    (%rcx,%rsi,1),%eax
   f961d:       a8 07                   test   $0x7,%al
   f961f:       74 5f                   je     f9680 <str::from_utf8::hdcda68873f01cd68dYR+0xa0>

@semarie
Copy link
Contributor Author

semarie commented Feb 4, 2016

The asm and the object code seems to mismatch:

Building libcore-db5a760f.rlib with -C save-temps.

  • from core-db5a760f.s:
.Ltmp806:
        .cfi_offset %rbp, -16
        testq   %rdx, %rdx
        je      .LBB2909_46
        leaq    -16(%rdx), %r11
        xorl    %ecx, %ecx
        leaq    _ZN3str15UTF8_CHAR_WIDTH20he77aff40c91d8da6qNSE(%rip), %r9
        movl    $49153, %r8d
        movabsq $-9187201950435737472, %r10
        .align  16, 0x90
.LBB2909_2:
        movzbl  (%rsi,%rcx), %r15d
        testb   %r15b, %r15b
        js      .LBB2909_3

and the corresponding object code generated:

   6:   48 85 d2                test   %rdx,%rdx
   9:   0f 84 60 02 00 00       je     26f <str::from_utf8::hdcda68873f01cd68dYR+0x26f>
   f:   4c 8d 5a f0             lea    0xfffffffffffffff0(%rdx),%r11
  13:   31 c9                   xor    %ecx,%ecx
  15:   4c 8d 0d 00 00 00 00    lea    0(%rip),%r9        # 1c <str::from_utf8::hdcda68873f01cd68dYR+0x1c>
  1c:   41 b8 01 c0 00 00       mov    $0xc001,%r8d
  22:   49 ba 80 80 80 80 80    mov    $0x8080808080808080,%r10
  29:   80 80 80 
  2c:   8d 74 26 00             lea    0x0(%rsi),%esi
  30:   44 0f b6 3c 0e          movzbl (%rsi,%rcx,1),%r15d
  35:   45 84 ff                test   %r15b,%r15b
  38:   78 16                   js     50 <str::from_utf8::hdcda68873f01cd68dYR+0x50>

I have tried to build libcore.rlib using -C no-integrated-as in order to check the internal linker, but it failed with:

error: linking with `egcc` failed: exit code: 1
note: "egcc" "-c" "-o" "x86_64-unknown-openbsd/stage1/lib/rustlib/x86_64-unknown-openbsd/lib/core-db5a760f.o" "x86_64-unknown-openbsd/stage1/lib/rustlib/x86_64-unknown-openbsd/lib/core-db5a760f.s"
note: x86_64-unknown-openbsd/stage1/lib/rustlib/x86_64-unknown-openbsd/lib/core-db5a760f.s: Assembler messages:
x86_64-unknown-openbsd/stage1/lib/rustlib/x86_64-unknown-openbsd/lib/core-db5a760f.s:67115: Error: unknown pseudo-op: `.cfi_personality'
x86_64-unknown-openbsd/stage1/lib/rustlib/x86_64-unknown-openbsd/lib/core-db5a760f.s:67116: Error: unknown pseudo-op: `.cfi_lsda'
x86_64-unknown-openbsd/stage1/lib/rustlib/x86_64-unknown-openbsd/lib/core-db5a760f.s:67878: Error: unknown pseudo-op: `.cfi_personality'
x86_64-unknown-openbsd/stage1/lib/rustlib/x86_64-unknown-openbsd/lib/core-db5a760f.s:67879: Error: unknown pseudo-op: `.cfi_lsda'
x86_64-unknown-openbsd/stage1/lib/rustlib/x86_64-unknown-openbsd/lib/core-db5a760f.s:85536: Error: unknown pseudo-op: `.cfi_personality'
x86_64-unknown-openbsd/stage1/lib/rustlib/x86_64-unknown-openbsd/lib/core-db5a760f.s:85537: Error: unknown pseudo-op: `.cfi_lsda'

error: aborting due to previous error

@dhuseby
Copy link

dhuseby commented Feb 5, 2016

BTW, I tried building the latest with the MALLOC_OPTIONS=S and I'm still getting segfaults.

@semarie
Copy link
Contributor Author

semarie commented Feb 6, 2016

@dhuseby you missed the point I think: the purpose of testing build with MALLOC_OPTIONS=S was to check if you get segfaults systematically (in opposition to "segfault sometimes" without it).

But for now, I think the issue could be a bug somewhere in code generation (in LLVM or in rustc, I dunno).

@semarie
Copy link
Contributor Author

semarie commented Feb 6, 2016

@steveklabnik could you update labels ?

  • For now, it is clear that segfaults occurs systematically on Illumos and on OpenBSD, and occasionally only (@dhuseby do you confirm my understanding ?) on Bitrig
  • The problem seems related to code generation

@nbaksalyar
Copy link
Contributor

@Zoxc filed an LLVM bug: https://llvm.org/bugs/show_bug.cgi?id=26554
Here's a linked issue: #31505

@semarie
Copy link
Contributor Author

semarie commented Feb 10, 2016

\o/

@semarie
Copy link
Contributor Author

semarie commented Feb 10, 2016

I confirm that reverting the LLVM patch rL253557 ("Alternative to long nops for X86 CPUs") permit to solve the issue (build and testsuite).

semarie added a commit to semarie/rust that referenced this issue Feb 17, 2016
The initial purpose is to workaround the LLVM bug
https://llvm.org/bugs/show_bug.cgi?id=26554 for OpenBSD.

By default, the `cpu' is defined to `generic`. But with a 64bit
processor, the optimization for `generic` will use invalid asm code as
NOP (the generated code for NOP isn't a NOP).

According to rust-lang#20777, "x86-64" is the right thing to do for x86_64
builds.

Closes: rust-lang#31363
bors added a commit that referenced this issue Feb 18, 2016
The initial purpose is to workaround the LLVM bug
https://llvm.org/bugs/show_bug.cgi?id=26554 for OpenBSD.

By default, the `cpu` is defined to `generic`. But with a 64bit
processor, the optimization for `generic` will use invalid asm code as
NOP (the generated code for NOP isn't a NOP).

According to #20777, "x86-64" is the right thing to do for x86_64
builds.

Closes: #31363

r? @alexcrichton
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-openbsd Operating system: OpenBSD
Projects
None yet
Development

No branches or pull requests

5 participants