Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPP-911] lisp-gpe/lisp_gpe_adjacency.c:366 (lisp_gpe_build_rewrite) assertion `0' fails while replaying an API trace #2350

Closed
vvalderrv opened this issue Feb 1, 2025 · 5 comments

Comments

@vvalderrv
Copy link
Contributor

Description

in the process of reproducing of VPP-910, I observed the "naive" API trace replay causes an assert. As per discussion on the team call today, creating a new bug for that.

 

The VPP is run in "make debug" inside a KVM VM with 3 e1000 interfaces:

 

DBGvpp# show int

Name Idx State Counter Count

GigabitEthernet0/4/0 1 down

GigabitEthernet0/5/0 2 down

GigabitEthernet0/6/0 3 down

local0 0 down

DBGvpp# 

 

The assert happens with the default config, i.e. immediately after running "make debug" and replaying the trace. The workaround is to zeroize the print handler for that message before replaying, this skips replaying of that message and avoids the assert being hit - so this is not a showstopper for me, but seems to not be a very robust behaviour - hence the conclusion to create a bug...

 

0: /home/ayourtch/vpp/build-data/../src/vnet/lisp-gpe/lisp_gpe_adjacency.c:366 (lisp_gpe_build_rewrite) assertion `0' fails

Program received signal SIGABRT, Aborted.

0x00007ffff54fc1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);

(gdb) bt

#0 0x00007ffff54fc1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

#1 0x00007ffff54fd8c8 in __GI_abort () at abort.c:90

#2 0x0000000000407db9 in os_panic () at /home/ayourtch/vpp/build-data/../src/vpp/vnet/main.c:263

#3 0x00007ffff61e2492 in debugger () at /home/ayourtch/vpp/build-data/../src/vppinfra/error.c:84

#4 0x00007ffff61e2899 in _clib_error (how_to_die=2, function_name=0x0, line_number=0, fmt=0x7ffff7565de0 "%s:%d (%s) assertion `%s' fails") at /home/ayourtch/vpp/build-data/../src/vppinfra/error.c:143

#5 0x00007ffff724d955 in lisp_gpe_build_rewrite (vnm=0x6f7140 <vnet_main>, sw_if_index=5, link_type=VNET_LINK_ARP, dst_address=0x0) at /home/ayourtch/vpp/build-data/../src/vnet/lisp-gpe/lisp_gpe_adjacency.c:366

#6 0x00007ffff74a0af4 in vnet_rewrite_for_sw_interface (vnm=0x6f7140 <vnet_main>, link_type=VNET_LINK_ARP, sw_if_index=5, node_index=256, dst_address=0x0, rw=0x7fffb63fcac0, max_rewrite_bytes=112) at /home/ayourtch/vpp/build-data/../src/vnet/adj/rewrite.c:133

#7 0x00007ffff7482bb0 in adj_glean_add_or_lock (proto=FIB_PROTOCOL_IP4, sw_if_index=5, nh_addr=0x0) at /home/ayourtch/vpp/build-data/../src/vnet/adj/adj_glean.c:73

#8 0x00007ffff746352d in fib_path_attached_get_adj (path=0x7fffb59cbe34, link=VNET_LINK_IP4) at /home/ayourtch/vpp/build-data/../src/vnet/fib/fib_path.c:609

#9 0x00007ffff7465305 in fib_path_resolve (path_index=14) at /home/ayourtch/vpp/build-data/../src/vnet/fib/fib_path.c:1586

#10 0x00007ffff745f7f2 in fib_path_list_resolve (path_list=0x7fffb5936798) at /home/ayourtch/vpp/build-data/../src/vnet/fib/fib_path_list.c:566

#11 0x00007ffff745fc4b in fib_path_list_create (flags=FIB_PATH_LIST_FLAG_NONE, rpaths=0x7fffb5977774) at /home/ayourtch/vpp/build-data/../src/vnet/fib/fib_path_list.c:718

#12 0x00007ffff7452f50 in fib_entry_src_interface_path_swap (src=0x7fffb5991188, entry=0x7fffb59b0cec, pl_flags=FIB_PATH_LIST_FLAG_NONE, paths=0x7fffb5977774) at /home/ayourtch/vpp/build-data/../src/vnet/fib/fib_entry_src_interface.c:61

#13 0x00007ffff74511af in fib_entry_src_action_path_swap (fib_entry=0x7fffb59b0cec, source=FIB_SOURCE_INTERFACE, flags=(FIB_ENTRY_FLAG_CONNECTED | FIB_ENTRY_FLAG_ATTACHED), rpaths=0x7fffb5977774) at /home/ayourtch/vpp/build-data/../src/vnet/fib/fib_entry_src.c:1172

#14 0x00007ffff744c8ef in fib_entry_create (fib_index=0, prefix=0x7fffb59ea6e0, source=FIB_SOURCE_INTERFACE, flags=(FIB_ENTRY_FLAG_CONNECTED | FIB_ENTRY_FLAG_ATTACHED), paths=0x7fffb5977774) at /home/ayourtch/vpp/build-data/../src/vnet/fib/fib_entry.c:674

#15 0x00007ffff7439a93 in fib_table_entry_update (fib_index=0, prefix=0x7fffb59ea6e0, source=FIB_SOURCE_INTERFACE, flags=(FIB_ENTRY_FLAG_CONNECTED | FIB_ENTRY_FLAG_ATTACHED), paths=0x7fffb5977774) at /home/ayourtch/vpp/build-data/../src/vnet/fib/fib_table.c:730

#16 0x00007ffff7439cf6 in fib_table_entry_update_one_path (fib_index=0, prefix=0x7fffb59ea6e0, source=FIB_SOURCE_INTERFACE, flags=(FIB_ENTRY_FLAG_CONNECTED | FIB_ENTRY_FLAG_ATTACHED), next_hop_proto=FIB_PROTOCOL_IP4, next_hop=0x0, next_hop_sw_if_index=5, next_hop_fib_index=4294967295,

next_hop_weight=1, next_hop_labels=0x0, path_flags=FIB_ROUTE_PATH_FLAG_NONE) at /home/ayourtch/vpp/build-data/../src/vnet/fib/fib_table.c:780

#17 0x00007ffff6f3ccd6 in ip4_add_interface_routes (sw_if_index=5, im=0x7ffff789bac0 <ip4_main>, fib_index=0, a=0x7fffb597baf8) at /home/ayourtch/vpp/build-data/../src/vnet/ip/ip4_forward.c:723

#18 0x00007ffff6f3daf4 in ip4_add_del_interface_address_internal (vm=0x7ffff7ba1400 <vlib_global_main>, sw_if_index=5, address=0x7fffb5957632, address_length=24, is_del=0) at /home/ayourtch/vpp/build-data/../src/vnet/ip/ip4_forward.c:955

#19 0x00007ffff6f3dc69 in ip4_add_del_interface_address (vm=0x7ffff7ba1400 <vlib_global_main>, sw_if_index=5, address=0x7fffb5957632, address_length=24, is_del=0) at /home/ayourtch/vpp/build-data/../src/vnet/ip/ip4_forward.c:980

#20 0x00007ffff6d4b205 in vl_api_sw_interface_add_del_address_t_handler (mp=0x7fffb5957620) at /home/ayourtch/vpp/build-data/../src/vnet/interface_api.c:292

#21 0x00007ffff7bcc1df in vl_msg_api_process_file (vm=0x7ffff7ba1400 <vlib_global_main>, filename=0x7fffb595a1bc "/home/ayourtch/acl1.api", first_index=0, last_index=149, which=REPLAY) at /home/ayourtch/vpp/build-data/../src/vlibmemory/memory_vlib.c:1737

#22 0x00007ffff7bcc7a2 in api_trace_command_fn (vm=0x7ffff7ba1400 <vlib_global_main>, input=0x7fffb59eaec0, cmd=0x7fffb5a14404) at /home/ayourtch/vpp/build-data/../src/vlibmemory/memory_vlib.c:1846

#23 0x00007ffff78cd409 in vlib_cli_dispatch_sub_commands (vm=0x7ffff7ba1400 <vlib_global_main>, cm=0x7ffff7ba15f0 <vlib_global_main+496>, input=0x7fffb59eaec0, parent_command_index=643) at /home/ayourtch/vpp/build-data/../src/vlib/cli.c:588

#24 0x00007ffff78cd319 in vlib_cli_dispatch_sub_commands (vm=0x7ffff7ba1400 <vlib_global_main>, cm=0x7ffff7ba15f0 <vlib_global_main+496>, input=0x7fffb59eaec0, parent_command_index=0) at /home/ayourtch/vpp/build-data/../src/vlib/cli.c:566

#25 0x00007ffff78cd6f1 in vlib_cli_input (vm=0x7ffff7ba1400 <vlib_global_main>, input=0x7fffb59eaec0, function=0x7ffff7959e50 <unix_vlib_cli_output>, function_arg=0) at /home/ayourtch/vpp/build-data/../src/vlib/cli.c:662

#26 0x00007ffff795f351 in unix_cli_process_input (cm=0x7ffff7ba1280 <unix_cli_main>, cli_file_index=0) at /home/ayourtch/vpp/build-data/../src/vlib/unix/cli.c:2189

#27 0x00007ffff795fdc3 in unix_cli_process (vm=0x7ffff7ba1400 <vlib_global_main>, rt=0x7fffb59da000, f=0x0) at /home/ayourtch/vpp/build-data/../src/vlib/unix/cli.c:2286

#28 0x00007ffff78f549e in vlib_process_bootstrap (_a=140736249666048) at /home/ayourtch/vpp/build-data/../src/vlib/main.c:1261

#29 0x00007ffff62068dc in clib_calljmp () at /home/ayourtch/vpp/build-data/../src/vppinfra/longjmp.S:110

#30 0x00007fffb62b19d0 in ?? ()

#31 0x00007ffff78f55d3 in vlib_process_startup (vm=0x14, p=0x8, f=0x7fffb5a02fb4) at /home/ayourtch/vpp/build-data/../src/vlib/main.c:1283

#32 0x0000000000000000 in ?? ()

(gdb)

Assignee

Florin Coras

Reporter

Andrew Yourtchenko

Comments

  • ayourtch (Wed, 2 Aug 2017 16:15:17 +0000): Issue in a different place in the code (acl-plugin new hash-based matching code)
  • ayourtch (Wed, 2 Aug 2017 16:14:14 +0000): Yeah, so looks like the underlying issue is actually a similar one to VPP-910, and it is actually my ace-plugin code that had a memory corruption. I think i have fixed it by now, and with the fix I can replay the trace without any problem whatsoever.... So, sorry for the false alarm and mea culpa!
  • florin.coras (Wed, 26 Jul 2017 08:02:27 +0000): Understood and thanks for that! Still, it would be great if you could 'encourage' the original reporter of VPP-910 to provide a bit more context . After a quick look I couldn't pinpoint it, but I suspect the issue may be misconfiguration that somehow forced the lisp interface to act as a 'non-encapsulating' interface.
  • ayourtch (Wed, 26 Jul 2017 05:56:27 +0000): Florin, I was just replaying the trace aimed to reproduce the VPP-910 - and saw it was triggering a different backtrace instead. For my reproduction purposes I had just zeroed the respective print handler within the API ("**set api_main.msg_print_handlers[41] = 0" in gdb) before replaying it, so it was not a problem.

But I asked on the call about this behaviour and the feedback was that I should open up JIRA anyway.

  • florin.coras (Tue, 25 Jul 2017 23:47:34 +0000): Andrew, could you provide more context as to what that trace was trying to achieve? As the assert suggests, lisp_gpe_build_rewrite should never be called.

Original issue: https://jira.fd.io/browse/VPP-911

@vvalderrv
Copy link
Contributor Author

Issue in a different place in the code (acl-plugin new hash-based matching code)

@vvalderrv
Copy link
Contributor Author

Yeah, so looks like the underlying issue is actually a similar one to VPP-910, and it is actually my ace-plugin code that had a memory corruption. I think i have fixed it by now, and with the fix I can replay the trace without any problem whatsoever.... So, sorry for the false alarm and mea culpa!

@vvalderrv
Copy link
Contributor Author

Understood and thanks for that! Still, it would be great if you could 'encourage' the original reporter of VPP-910 to provide a bit more context . After a quick look I couldn't pinpoint it, but I suspect the issue may be misconfiguration that somehow forced the lisp interface to act as a 'non-encapsulating' interface. 

 

@vvalderrv
Copy link
Contributor Author

Florin, I was just replaying the trace aimed to reproduce the VPP-910 - and saw it was triggering a different backtrace instead. For my reproduction purposes I had just zeroed the respective print handler within the API ("**set api_main.msg_print_handlers[41] = 0" in gdb) before replaying it, so it was not a problem.

But I asked on the call about this behaviour and the feedback was that I should open up JIRA anyway.

@vvalderrv
Copy link
Contributor Author

Andrew, could you provide more context as to what that trace was trying to achieve? As the assert suggests, lisp_gpe_build_rewrite should never be called. 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant