-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPP-1334] VPP crash on stats thread #2800
Comments
Jong Hahn,
Yes, it is fixed. Thank you! |
Hongjun Ni Is this still an issue in 19.01? |
I am observing the same issue. It happens (in my case) only when interfaces are configured (assigned IPs etc.) and up. In my case, I observe the issue when So I set a watchpoint on this variable (((vec_header_t *) (stats_main.node_dups[1][521].name) - 1)->len) and It looks like reallocating the vector causes the problem, as we can see in your backtrace (frame #4) the calculated new size is 60300492 bytes. I have found that this big size (35500380) in my case was set while resizing the vector (some times the backtrace is slightly different because of reallocating the vector in a different place) : Old value = 17
Steps to reproduce in my case:
Best regards, Lukasz |
Description
Below is the backtrace log:
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffbb894f700 (LWP 5066)]
0x00007ffff50361f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) backtrace
#0 0x00007ffff50361f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff50378e8 in __GI_abort () at abort.c:90
#2 0x0000000000407c69 in os_panic () at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vpp/vnet/main.c:310
#3 0x00007ffff5ddb537 in os_out_of_memory () at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/unix-misc.c:221
#4 0x00007ffff5db73fa in clib_mem_alloc_aligned_at_offset (size=60300492, align=4, align_offset=4, os_out_of_memory_on_failure=1) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/mem.h:105
#5 0x00007ffff5db7834 in vec_resize_allocate_memory (v=0x7fffac3cd06c, length_increment=60300488, data_bytes=60300492, header_bytes=4, data_align=4) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/vec.c:84
#6 0x00007ffff5d739cc in _vec_resize_inline (v=0x7fffac3cd06c, length_increment=60300488, data_bytes=60300488, header_bytes=0, data_align=1) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/vec.h:145
#7 0x00007ffff5d77fd6 in serialize_write_not_inline (m=0x7fffb8debd20, s=0x7fffb8debd80, n_bytes_to_write=60300488, flags=2) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/serialize.c:739
#8 0x00007ffff5d78779 in serialize_read_write_not_inline (m=0x7fffb8debd20, s=0x7fffb8debd80, n_bytes=60300488, flags=2) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/serialize.c:846
#9 0x00007ffff7b9e187 in serialize_stream_read_write (header=0x7fffb8debd20, s=0x7fffb8debd80, n_bytes=60300488, flags=2) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/serialize.h:140
#10 0x00007ffff7b9e237 in serialize_get (m=0x7fffb8debd20, n_bytes=60300488) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/serialize.h:180
#11 0x00007ffff7b9eb10 in vlib_node_serialize (vm=0x7ffff7b8ca40 <vlib_global_main>, node_dups=0x7fffac3cd090, vector=0x7fffac417324 "\023\022\006\023null-node\017\001\001\001\003", include_nexts=0, include_stats=1)
at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vlibapi/node_serialize.c:122
#12 0x0000000000426ffa in update_serialized_nodes (sm=0x709f60 <stats_main>) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vpp/stats/stat_segment.c:424
#13 0x00000000004273af in do_stat_segment_updates (sm=0x709f60 <stats_main>) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vpp/stats/stat_segment.c:508
#14 0x0000000000421f06 in stats_thread_fn (arg=0x7fffb5102700) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vpp/stats/stats.c:2489
#15 0x00007ffff5d461d8 in clib_calljmp () at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/longjmp.S:110
#16 0x00007ffbb894eed0 in ?? ()
#17 0x00007ffff791c853 in vlib_worker_thread_bootstrap_fn (arg=0x7fffb5102700) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vlib/threads.c:681
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)
Thanks,
Hongjun
Assignee
Unassigned
Reporter
Hongjun Ni
Comments
Yes, it is fixed. Thank you!
I am observing the same issue. It happens (in my case) only when interfaces are configured (assigned IPs etc.) and up. In my case, I observe the issue when
i = 521 and j = 1, in the "for" loop in vlib_node_serialize() function.
So I set a watchpoint on this variable (((vec_header_t *) (stats_main.node_dups[1][521].name) - 1)->len) and It looks like reallocating the vector causes the problem, as we can see in your backtrace (frame #4) the calculated new size is 60300492 bytes. I have found that this big size (35500380) in my case was set while resizing the vector (some times the backtrace is slightly different because of reallocating the vector in a different place) :
Old value = 17
New value = 35500380
set_free_elt (n_user_data_bytes=, uoffset=, v=0x7ff9e778b000) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c:185
185 /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c: No such file or directory.
(gdb) bt
#0 set_free_elt (n_user_data_bytes=, uoffset=, v=0x7ff9e778b000) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c:185
#1 mheap_put (v=0x7ff9e778b000, uoffset=) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c:856
#2 0x00007ffa6892a8ed in clib_mem_free (p=0x7ff9e9161e9c) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mem.h:186
#3 vec_resize_allocate_memory (v=, length_increment=length_increment@entry=1, data_bytes=, header_bytes=, header_bytes@entry=0, data_align=data_align@entry=8)
at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/vec.c:96
#4 0x00005562f757c1e5 in _vec_resize_inline (data_align=, header_bytes=, data_bytes=, length_increment=, v=)
at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/vec.h:145
#5 show_stat_segment_command_fn (vm=, input=, cmd=) at /home/lukasz/deploy/vpp/build-data/../src/vpp/stats/stat_segment.c:356
#6 0x00007ffa6976df7e in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ffa699d8f80 <vlib_global_main>, cm=cm@entry=0x7ffa699d9160 <vlib_global_main+480>, input=input@entry=0x7ff9e98bcf60,
parent_command_index=) at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:589
#7 0x00007ffa6976e414 in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ffa699d8f80 <vlib_global_main>, cm=cm@entry=0x7ffa699d9160 <vlib_global_main+480>, input=input@entry=0x7ff9e98bcf60,
parent_command_index=) at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:567
#8 0x00007ffa6976e414 in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ffa699d8f80 <vlib_global_main>, cm=cm@entry=0x7ffa699d9160 <vlib_global_main+480>, input=input@entry=0x7ff9e98bcf60,
parent_command_index=parent_command_index@entry=0) at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:567
#9 0x00007ffa6976e7d0 in vlib_cli_input (vm=0x7ffa699d8f80 <vlib_global_main>, input=input@entry=0x7ff9e98bcf60, function=function@entry=0x7ffa697b5230 <unix_vlib_cli_output>, function_arg=function_arg@entry=1)
at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:663
#10 0x00007ffa697b7005 in unix_cli_process_input (cm=0x7ffa699d8d60 <unix_cli_main>, cli_file_index=1) at /home/lukasz/deploy/vpp/build-data/../src/vlib/unix/cli.c:2419
#11 unix_cli_process (vm=0x7ffa699d8f80 <vlib_global_main>, rt=0x7ff9e98ac000, f=) at /home/lukasz/deploy/vpp/build-data/../src/vlib/unix/cli.c:2535
#12 0x00007ffa697817c6 in vlib_process_bootstrap (_a=) at /home/lukasz/deploy/vpp/build-data/../src/vlib/main.c:1231
#13 0x00007ffa688f4768 in clib_calljmp () at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/longjmp.S:110
#14 0x00007ff9e7b8abf0 in ?? ()
#15 0x00007ffa69782959 in vlib_process_startup (f=0x0, p=0x7ff9e98ac000, vm=0x7ffa699d8f80 <vlib_global_main>) at /home/lukasz/deploy/vpp/build-data/../src/vlib/main.c:1253
#16 dispatch_process (vm=0x7ffa699d8f80 <vlib_global_main>, p=0x7ff9e98ac000, last_time_stamp=0, f=0x0) at /home/lukasz/deploy/vpp/build-data/../src/vlib/main.c:1298
#17 0x2d726573752d7473 in ?? ()
#18 0x6d2f057475706e69 in ?? ()
#19 0x6c6961662070616d in ?? ()
#20 0x0000000a00657275 in ?? ()
#21 0x00000000000001a2 in ?? ()
#22 0x00007ff9e915a0a8 in ?? ()
#23 0x00007ff9e86be2bc in ?? ()
#24 0x0000000000000000 in ?? ()
Steps to reproduce in my case:
- Run vpp - Set MACs/IPs/ up interface - issue: vppctl show statistics segment (sometimes it hits for the first time, other I need to call it 3-4 times)
Best regards,
Lukasz
Original issue: https://jira.fd.io/browse/VPP-1334
The text was updated successfully, but these errors were encountered: