Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPP-907] VPP with worker threads crashes on 4K VXLAN/BD setup #2346

Closed
vvalderrv opened this issue Feb 1, 2025 · 0 comments
Closed

[VPP-907] VPP with worker threads crashes on 4K VXLAN/BD setup #2346

vvalderrv opened this issue Feb 1, 2025 · 0 comments

Comments

@vvalderrv
Copy link
Contributor

Description

VPP running with 2 worker threads may crash on the performance testbed with 4K BDs with 4K VXLAN tunnels setup. The crash is dependent upon how traffic is started on the L2 to VXLAN encap direction. This is the traceback from the core file:

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Core was generated by `vpp -c /etc/vpp/startup.conf'.

Program terminated with signal SIGABRT, Aborted.

#0  0x00007f7c31002c37 in read_alias_file (fname=,

    fname_len=) at localealias.c:335

335    localealias.c: No such file or directory.

(gdb) bt

#0  0x00007f7c31002c37 in read_alias_file (fname=,

    fname_len=) at localealias.c:335

#1  0x00007f7c32b43634 in unix_signal_handler (signum=11, si=0x7f7bf18e2570,

    uc=0x7f7bf18e2440) at /scratch/loj/vpp1707/build-data/../src/vlib/unix/main.c:118

#2 

#3  0x00007f7c32ad0665 in vlib_get_frame_no_check (vm=0x0, frame_index=4294967295)

    at /scratch/loj/vpp1707/build-data/../src/vlib/node_funcs.h:221

#4  0x00007f7c32ad078d in vlib_get_frame (vm=0x7f7bf12b7d1c, frame_index=4294967295)

    at /scratch/loj/vpp1707/build-data/../src/vlib/node_funcs.h:241

#5  0x00007f7c32ad2b2f in vlib_put_next_frame (vm=0x7f7bf12b7d1c, r=0x7f7bf5fc6e9c,

    next_index=0, n_vectors_left=0)

    at /scratch/loj/vpp1707/build-data/../src/vlib/main.c:472

#6  0x00007f7c32046548 in l2output_node_inline (vm=0x7f7bf12b7d1c, node=0x7f7bf5fc6e9c,

    frame=0x7f7bf3a05600, do_trace=0)

    at /scratch/loj/vpp1707/build-data/../src/vnet/l2/l2_output.c:441

#7  0x00007f7c320465e3 in l2output_node_fn (vm=0x7f7bf12b7d1c, node=0x7f7bf5fc6e9c,

    frame=0x7f7bf3a05600) at /scratch/loj/vpp1707/build-data/../src/vnet/l2/l2_output.c:453

#8  0x00007f7c32ad40ff in dispatch_node (vm=0x7f7bf12b7d1c, node=0x7f7bf5fc6e9c,

    type=VLIB_NODE_TYPE_INTERNAL, dispatch_state=VLIB_NODE_STATE_POLLING,

    frame=0x7f7bf3a05600, last_time_stamp=11817859168407334)

    at /scratch/loj/vpp1707/build-data/../src/vlib/main.c:1016

#9  0x00007f7c32ad46b8 in dispatch_pending_node (vm=0x7f7bf12b7d1c, pending_frame_index=5,

    last_time_stamp=11817859168407334)

    at /scratch/loj/vpp1707/build-data/../src/vlib/main.c:1166

#10 0x00007f7c32ad6734 in vlib_main_or_worker_loop (vm=0x7f7bf12b7d1c, is_main=0)

    at /scratch/loj/vpp1707/build-data/../src/vlib/main.c:1625

#11 0x00007f7c32ad6828 in vlib_worker_loop (vm=0x7f7bf12b7d1c)

    at /scratch/loj/vpp1707/build-data/../src/vlib/main.c:1650

#12 0x00007f7c32b2021c in vlib_worker_thread_fn (arg=0x7f7bf0725ac0)

    at /scratch/loj/vpp1707/build-data/../src/vlib/threads.c:1378

#13 0x00007f7c31809190 in clib_calljmp ()

    at /scratch/loj/vpp1707/build-data/../src/vppinfra/longjmp.S:110

#14 0x00007f7b219fcd40 in ?? ()

#15 0x00007f7c32b1b753 in vlib_worker_thread_bootstrap_fn (arg=0x7f7bf0725ac0)

    at /scratch/loj/vpp1707/build-data/../src/vlib/threads.c:464

Backtrace stopped: previous frame inner to this frame (corrupt stack?)

(gdb) frame 6

#6  0x00007f7c32046548 in l2output_node_inline (vm=0x7f7bf12b7d1c, node=0x7f7bf5fc6e9c,

    frame=0x7f7bf3a05600, do_trace=0)

    at /scratch/loj/vpp1707/build-data/../src/vnet/l2/l2_output.c:441

(gdb) p cached_sw_if_index

$1 = 2564

(gdb) p cached_next_index

$2 = 0

(gdb) down

#5  0x00007f7c32ad2b2f in vlib_put_next_frame (vm=0x7f7bf12b7d1c, r=0x7f7bf5fc6e9c,

    next_index=0, n_vectors_left=0)

    at /scratch/loj/vpp1707/build-data/../src/vlib/main.c:472

(gdb) down

#4  0x00007f7c32ad078d in vlib_get_frame (vm=0x7f7bf12b7d1c, frame_index=4294967295)

    at /scratch/loj/vpp1707/build-data/../src/vlib/node_funcs.h:241

(gdb) up

#5  0x00007f7c32ad2b2f in vlib_put_next_frame (vm=0x7f7bf12b7d1c, r=0x7f7bf5fc6e9c,

    next_index=0, n_vectors_left=0)

    at /scratch/loj/vpp1707/build-data/../src/vlib/main.c:472

(gdb) up

#6  0x00007f7c32046548 in l2output_node_inline (vm=0x7f7bf12b7d1c, node=0x7f7bf5fc6e9c,

    frame=0x7f7bf3a05600, do_trace=0)

    at /scratch/loj/vpp1707/build-data/../src/vnet/l2/l2_output.c:441

(gdb) p *node

$3 = {cacheline0 = 0x7f7bf5fc6e9c "ye\004\062|\177",

  function = 0x7f7c32046579 <l2output_node_fn>, errors = 0x7f7bf0acfd50,

  clocks_since_last_overflow = 0, max_clock = 763318, max_clock_n = 256,

  calls_since_last_overflow = 0, vectors_since_last_overflow = 0, next_frame_index = 870,

  node_index = 292, input_main_loops_per_call = 0,

  main_loop_count_last_dispatch = 2747018781, main_loop_vector_stats = {1024, 0},

  flags = 0, state = 0, n_next_nodes = 9, cached_next_index = 0, thread_index = 2,

  runtime_data = 0x7f7bf5fc6ee2 ""}

(gdb) down

#5  0x00007f7c32ad2b2f in vlib_put_next_frame (vm=0x7f7bf12b7d1c, r=0x7f7bf5fc6e9c,

    next_index=0, n_vectors_left=0)

    at /scratch/loj/vpp1707/build-data/../src/vlib/main.c:472

(gdb) p *nf

$4 = {frame_index = 4294967295, node_runtime_index = 281, flags = 0,

  vectors_since_last_overflow = 0}

Assignee

John Lo

Reporter

John Lo

Comments

No comments.

Original issue: https://jira.fd.io/browse/VPP-907

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant