Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPP-1334] VPP crash on stats thread #2800

Closed
vvalderrv opened this issue Feb 1, 2025 · 3 comments
Closed

[VPP-1334] VPP crash on stats thread #2800

vvalderrv opened this issue Feb 1, 2025 · 3 comments

Comments

@vvalderrv
Copy link
Contributor

Description

Below is the backtrace log:

Program received signal SIGABRT, Aborted.

[Switching to Thread 0x7ffbb894f700 (LWP 5066)]

0x00007ffff50361f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);

(gdb) backtrace

#0 0x00007ffff50361f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

#1 0x00007ffff50378e8 in __GI_abort () at abort.c:90

#2 0x0000000000407c69 in os_panic () at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vpp/vnet/main.c:310

#3 0x00007ffff5ddb537 in os_out_of_memory () at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/unix-misc.c:221

#4 0x00007ffff5db73fa in clib_mem_alloc_aligned_at_offset (size=60300492, align=4, align_offset=4, os_out_of_memory_on_failure=1) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/mem.h:105

#5 0x00007ffff5db7834 in vec_resize_allocate_memory (v=0x7fffac3cd06c, length_increment=60300488, data_bytes=60300492, header_bytes=4, data_align=4) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/vec.c:84

#6 0x00007ffff5d739cc in _vec_resize_inline (v=0x7fffac3cd06c, length_increment=60300488, data_bytes=60300488, header_bytes=0, data_align=1) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/vec.h:145

#7 0x00007ffff5d77fd6 in serialize_write_not_inline (m=0x7fffb8debd20, s=0x7fffb8debd80, n_bytes_to_write=60300488, flags=2) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/serialize.c:739

#8 0x00007ffff5d78779 in serialize_read_write_not_inline (m=0x7fffb8debd20, s=0x7fffb8debd80, n_bytes=60300488, flags=2) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/serialize.c:846

#9 0x00007ffff7b9e187 in serialize_stream_read_write (header=0x7fffb8debd20, s=0x7fffb8debd80, n_bytes=60300488, flags=2) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/serialize.h:140

#10 0x00007ffff7b9e237 in serialize_get (m=0x7fffb8debd20, n_bytes=60300488) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/serialize.h:180

#11 0x00007ffff7b9eb10 in vlib_node_serialize (vm=0x7ffff7b8ca40 <vlib_global_main>, node_dups=0x7fffac3cd090, vector=0x7fffac417324 "\023\022\006\023null-node\017\001\001\001\003", include_nexts=0, include_stats=1)

at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vlibapi/node_serialize.c:122

#12 0x0000000000426ffa in update_serialized_nodes (sm=0x709f60 <stats_main>) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vpp/stats/stat_segment.c:424

#13 0x00000000004273af in do_stat_segment_updates (sm=0x709f60 <stats_main>) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vpp/stats/stat_segment.c:508

#14 0x0000000000421f06 in stats_thread_fn (arg=0x7fffb5102700) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vpp/stats/stats.c:2489

#15 0x00007ffff5d461d8 in clib_calljmp () at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vppinfra/longjmp.S:110

#16 0x00007ffbb894eed0 in ?? ()

#17 0x00007ffff791c853 in vlib_worker_thread_bootstrap_fn (arg=0x7fffb5102700) at /home/ytatsumi/git/multi-port03/vpp/build-data/../src/vlib/threads.c:681

Backtrace stopped: previous frame inner to this frame (corrupt stack?)

(gdb)

Thanks,

Hongjun

Assignee

Unassigned

Reporter

Hongjun Ni

Comments

  • henry_ni (Mon, 18 Feb 2019 03:18:27 +0000): Jong Hahn,

 

Yes, it is fixed.  Thank you!

  • jhahn (Sun, 17 Feb 2019 23:01:26 +0000): Hongjun Ni Is this still an issue in 19.01?
  • lukaszmajczak (Wed, 11 Jul 2018 09:38:59 +0000):

    I am observing the same issue. It happens (in my case) only when interfaces are configured (assigned IPs etc.) and up. In my case, I observe the issue when

i = 521 and j = 1, in the "for" loop in vlib_node_serialize() function.

So I set a watchpoint on this variable (((vec_header_t *) (stats_main.node_dups[1][521].name) - 1)->len) and It looks like reallocating the vector causes the problem, as we can see in your backtrace (frame #4) the calculated new size is 60300492 bytes. I have found that this big size (35500380) in my case was set while resizing the vector (some times the backtrace is slightly different because of reallocating the vector in a different place) : 

Old value = 17

New value = 35500380

set_free_elt (n_user_data_bytes=, uoffset=, v=0x7ff9e778b000) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c:185

185 /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c: No such file or directory.

(gdb) bt

#0 set_free_elt (n_user_data_bytes=, uoffset=, v=0x7ff9e778b000) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c:185

#1 mheap_put (v=0x7ff9e778b000, uoffset=) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c:856

#2 0x00007ffa6892a8ed in clib_mem_free (p=0x7ff9e9161e9c) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mem.h:186

#3 vec_resize_allocate_memory (v=, length_increment=length_increment@entry=1, data_bytes=, header_bytes=, header_bytes@entry=0, data_align=data_align@entry=8)

at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/vec.c:96

#4 0x00005562f757c1e5 in _vec_resize_inline (data_align=, header_bytes=, data_bytes=, length_increment=, v=)

at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/vec.h:145

#5 show_stat_segment_command_fn (vm=, input=, cmd=) at /home/lukasz/deploy/vpp/build-data/../src/vpp/stats/stat_segment.c:356

#6 0x00007ffa6976df7e in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ffa699d8f80 <vlib_global_main>, cm=cm@entry=0x7ffa699d9160 <vlib_global_main+480>, input=input@entry=0x7ff9e98bcf60,

parent_command_index=) at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:589

#7 0x00007ffa6976e414 in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ffa699d8f80 <vlib_global_main>, cm=cm@entry=0x7ffa699d9160 <vlib_global_main+480>, input=input@entry=0x7ff9e98bcf60,

parent_command_index=) at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:567

#8 0x00007ffa6976e414 in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ffa699d8f80 <vlib_global_main>, cm=cm@entry=0x7ffa699d9160 <vlib_global_main+480>, input=input@entry=0x7ff9e98bcf60,

parent_command_index=parent_command_index@entry=0) at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:567

#9 0x00007ffa6976e7d0 in vlib_cli_input (vm=0x7ffa699d8f80 <vlib_global_main>, input=input@entry=0x7ff9e98bcf60, function=function@entry=0x7ffa697b5230 <unix_vlib_cli_output>, function_arg=function_arg@entry=1)

at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:663

#10 0x00007ffa697b7005 in unix_cli_process_input (cm=0x7ffa699d8d60 <unix_cli_main>, cli_file_index=1) at /home/lukasz/deploy/vpp/build-data/../src/vlib/unix/cli.c:2419

#11 unix_cli_process (vm=0x7ffa699d8f80 <vlib_global_main>, rt=0x7ff9e98ac000, f=) at /home/lukasz/deploy/vpp/build-data/../src/vlib/unix/cli.c:2535

#12 0x00007ffa697817c6 in vlib_process_bootstrap (_a=) at /home/lukasz/deploy/vpp/build-data/../src/vlib/main.c:1231

#13 0x00007ffa688f4768 in clib_calljmp () at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/longjmp.S:110

#14 0x00007ff9e7b8abf0 in ?? ()

#15 0x00007ffa69782959 in vlib_process_startup (f=0x0, p=0x7ff9e98ac000, vm=0x7ffa699d8f80 <vlib_global_main>) at /home/lukasz/deploy/vpp/build-data/../src/vlib/main.c:1253

#16 dispatch_process (vm=0x7ffa699d8f80 <vlib_global_main>, p=0x7ff9e98ac000, last_time_stamp=0, f=0x0) at /home/lukasz/deploy/vpp/build-data/../src/vlib/main.c:1298

#17 0x2d726573752d7473 in ?? ()

#18 0x6d2f057475706e69 in ?? ()

#19 0x6c6961662070616d in ?? ()

#20 0x0000000a00657275 in ?? ()

#21 0x00000000000001a2 in ?? ()

#22 0x00007ff9e915a0a8 in ?? ()

#23 0x00007ff9e86be2bc in ?? ()

#24 0x0000000000000000 in ?? ()

 

Steps to reproduce in my case:

    - Run vpp - Set MACs/IPs/ up interface - issue: vppctl show statistics segment (sometimes it hits for the first time, other I need to call it 3-4 times)

Best regards,

Lukasz

Original issue: https://jira.fd.io/browse/VPP-1334

@vvalderrv
Copy link
Contributor Author

Jong Hahn,

 

Yes, it is fixed.  Thank you! 

@vvalderrv
Copy link
Contributor Author

Hongjun Ni Is this still an issue in 19.01?

@vvalderrv
Copy link
Contributor Author

I am observing the same issue. It happens (in my case) only when interfaces are configured (assigned IPs etc.) and up. In my case, I observe the issue when
i = 521 and j = 1, in the "for" loop in vlib_node_serialize() function.

So I set a watchpoint on this variable (((vec_header_t *) (stats_main.node_dups[1][521].name) - 1)->len) and It looks like reallocating the vector causes the problem, as we can see in your backtrace (frame #4) the calculated new size is 60300492 bytes. I have found that this big size (35500380) in my case was set while resizing the vector (some times the backtrace is slightly different because of reallocating the vector in a different place) : 

Old value = 17
New value = 35500380
set_free_elt (n_user_data_bytes=<optimized out>, uoffset=<optimized out>, v=0x7ff9e778b000) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c:185
185 /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c: No such file or directory.
(gdb) bt
#0 set_free_elt (n_user_data_bytes=<optimized out>, uoffset=<optimized out>, v=0x7ff9e778b000) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c:185
#1 mheap_put (v=0x7ff9e778b000, uoffset=<optimized out>) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mheap.c:856
#2 0x00007ffa6892a8ed in clib_mem_free (p=0x7ff9e9161e9c) at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/mem.h:186
#3 vec_resize_allocate_memory (v=<optimized out>, length_increment=length_increment@entry=1, data_bytes=<optimized out>, header_bytes=<optimized out>, header_bytes@entry=0, data_align=data_align@entry=8)
at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/vec.c:96
#4 0x00005562f757c1e5 in _vec_resize_inline (data_align=<optimized out>, header_bytes=<optimized out>, data_bytes=<optimized out>, length_increment=<optimized out>, v=<optimized out>)
at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/vec.h:145
#5 show_stat_segment_command_fn (vm=<optimized out>, input=<optimized out>, cmd=<optimized out>) at /home/lukasz/deploy/vpp/build-data/../src/vpp/stats/stat_segment.c:356
#6 0x00007ffa6976df7e in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ffa699d8f80 <vlib_global_main>, cm=cm@entry=0x7ffa699d9160 <vlib_global_main+480>, input=input@entry=0x7ff9e98bcf60,
parent_command_index=<optimized out>) at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:589
#7 0x00007ffa6976e414 in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ffa699d8f80 <vlib_global_main>, cm=cm@entry=0x7ffa699d9160 <vlib_global_main+480>, input=input@entry=0x7ff9e98bcf60,
parent_command_index=<optimized out>) at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:567
#8 0x00007ffa6976e414 in vlib_cli_dispatch_sub_commands (vm=vm@entry=0x7ffa699d8f80 <vlib_global_main>, cm=cm@entry=0x7ffa699d9160 <vlib_global_main+480>, input=input@entry=0x7ff9e98bcf60,
parent_command_index=parent_command_index@entry=0) at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:567
#9 0x00007ffa6976e7d0 in vlib_cli_input (vm=0x7ffa699d8f80 <vlib_global_main>, input=input@entry=0x7ff9e98bcf60, function=function@entry=0x7ffa697b5230 <unix_vlib_cli_output>, function_arg=function_arg@entry=1)
at /home/lukasz/deploy/vpp/build-data/../src/vlib/cli.c:663
#10 0x00007ffa697b7005 in unix_cli_process_input (cm=0x7ffa699d8d60 <unix_cli_main>, cli_file_index=1) at /home/lukasz/deploy/vpp/build-data/../src/vlib/unix/cli.c:2419
#11 unix_cli_process (vm=0x7ffa699d8f80 <vlib_global_main>, rt=0x7ff9e98ac000, f=<optimized out>) at /home/lukasz/deploy/vpp/build-data/../src/vlib/unix/cli.c:2535
#12 0x00007ffa697817c6 in vlib_process_bootstrap (_a=<optimized out>) at /home/lukasz/deploy/vpp/build-data/../src/vlib/main.c:1231
#13 0x00007ffa688f4768 in clib_calljmp () at /home/lukasz/deploy/vpp/build-data/../src/vppinfra/longjmp.S:110
#14 0x00007ff9e7b8abf0 in ?? ()
#15 0x00007ffa69782959 in vlib_process_startup (f=0x0, p=0x7ff9e98ac000, vm=0x7ffa699d8f80 <vlib_global_main>) at /home/lukasz/deploy/vpp/build-data/../src/vlib/main.c:1253
#16 dispatch_process (vm=0x7ffa699d8f80 <vlib_global_main>, p=0x7ff9e98ac000, last_time_stamp=0, f=0x0) at /home/lukasz/deploy/vpp/build-data/../src/vlib/main.c:1298
#17 0x2d726573752d7473 in ?? ()
#18 0x6d2f057475706e69 in ?? ()
#19 0x6c6961662070616d in ?? ()
#20 0x0000000a00657275 in ?? ()
#21 0x00000000000001a2 in ?? ()
#22 0x00007ff9e915a0a8 in ?? ()
#23 0x00007ff9e86be2bc in ?? ()
#24 0x0000000000000000 in ?? ()

 

Steps to reproduce in my case:

  1. Run vpp
  2. Set MACs/IPs/ up interface
  3. issue: vppctl show statistics segment (sometimes it hits for the first time, other I need to call it 3-4 times)

Best regards,

Lukasz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant