Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatched Indirection Table error on servicing #771

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

erfrimod
Copy link

@erfrimod erfrimod commented Feb 3, 2025

Netvsp driver has a baked-in assumption that the number of resources available cannot change during servicing. There are rare cases where this is not true, such as FHR during FPGA update, should they decide to reduce the resources given. (This issue was seen during a 48 VM bringup effort.)

I am attempting to fix this by resizing the indirection table in the hopes that restart_queues() will handle the new size.

TODO:

  • I've added code to randomly reduce/increase the adapter indirection size reported to netvsp. This is just for testing on my lab machine and will be removed.
  • Find evidence that the rss table is actually changing.

Kmsg logs from testing a resize from 8 to 32

[1.566393] netvsp: ERROR worker_new{ name="UnderhillWorker" action="new"}:init:init/new_underhill_vm{ correlation_id=9d2e061b-aecb-45bb-a904-c2939bce854c}:  ERIK Arc::new Adapter multiqueue.indirection_table_size 8 count 27 even false table_size 32
[1.566734] vmbus_server::channels: INFO  new channel offer_id=OfferId(0) key={f8615163-df3e-46c5-913f-f2d2f965ed0e}-{f8615163-0000-1000-2000-0204200f3be1}-0 confidential_ring_buffer=true confidential_external_memory=false
[1.573569] vfio-pci 721a:00:00.0: enabling device (0000 -> 0002)
[1.655246] vfio-pci 721a:00:00.0: vfio-noiommu device opened by user (tp:82)
[1.850159] vmbus_server::channels: INFO  new channel offer_id=OfferId(1) key={ba6163d9-04a1-4d29-b605-72e2ffb1dc7f}-{ca56751f-e643-4bef-bf54-f73678e8b7b5}-0 confidential_ring_buffer=true confidential_external_memory=false
[1.851541] vmbus_server::channels: INFO  new channel offer_id=OfferId(2) key={0e0b6031-5213-4934-818b-38d90ced39db}-{b6650ff7-33bc-4840-8048-e0676786f393}-0 confidential_ring_buffer=true confidential_external_memory=false
[1.863060] vmbus_server::channels::saved_state: INFO  channel restored key={f8615163-df3e-46c5-913f-f2d2f965ed0e}-{f8615163-0000-1000-2000-0204200f3be1}-0 state=Open
[1.863375] vmbus_server::channels::saved_state: INFO  channel restored key={ba6163d9-04a1-4d29-b605-72e2ffb1dc7f}-{ca56751f-e643-4bef-bf54-f73678e8b7b5}-0 state=Open
[1.864251] vmbus_server::channels::saved_state: INFO  channel restored key={0e0b6031-5213-4934-818b-38d90ced39db}-{b6650ff7-33bc-4840-8048-e0676786f393}-0 state=Open
[1.864589] vmbus_server::channels::saved_state: INFO  channel restored key={525074dc-8985-46e2-8057-a307dc18a502}-{1eccfd72-4b41-45ef-b73a-4a6e44c12924}-0 state=Open
[1.864782] vmbus_server::channels::saved_state: INFO  channel restored key={cfa8b69e-5b4a-4cc0-b98b-8ba1a1f3f95a}-{58f75a6d-d949-4320-99e1-a2a2576d581c}-0 state=Open
[1.864985] vmbus_server::channels::saved_state: INFO  channel restored key={f912ad6d-2b17-48ea-bd65-f927a61c7684}-{d34b2567-b9b6-42b9-8778-0a4ec0b955bf}-0 state=Open
[1.865198] vmbus_server::channels::saved_state: INFO  channel restored key={da0a7802-e377-4aac-8e77-0558eb1073f8}-{5620e0c7-8062-4dce-aeb7-520c7ef76171}-0 state=Open
[1.865497] vmbus_server::channels::saved_state: INFO  channel restored key={57164f39-9115-4e78-ab55-382f3bd5422d}-{fd149e91-82e0-4a7d-afa6-2a4166cbd7c0}-0 state=Open
[1.865761] vmbus_server::channels::saved_state: INFO  channel restored key={a9a0f4e7-5a45-4d96-b827-8a841e8c03e6}-{242ff919-07db-4180-9c2e-b86cb68c8c55}-0 state=Open
[1.866009] vmbus_server::channels::saved_state: INFO  channel restored key={9527e630-d0ae-497b-adce-e80ab0175caf}-{2dd1ce17-079e-403c-b352-a1921ee207ee}-0 state=Open
[1.866222] vmbus_server::channels::saved_state: INFO  channel restored key={ba6163d9-04a1-4d29-b605-72e2ffb1dc7f}-{a87a3455-592f-4a91-a666-e6e4f9b422c1}-0 state=Open
[1.866469] vmbus_server::channels::saved_state: INFO  channel restored key={ba6163d9-04a1-4d29-b605-72e2ffb1dc7f}-{ca56751f-e643-4bef-bf54-f73678e8b7b5}-1 state=Open
[1.866918] vmbus_server::channels::saved_state: INFO  channel restored key={ba6163d9-04a1-4d29-b605-72e2ffb1dc7f}-{ca56751f-e643-4bef-bf54-f73678e8b7b5}-2 state=Open
[1.867116] vmbus_server::channels::saved_state: INFO  channel restored key={ba6163d9-04a1-4d29-b605-72e2ffb1dc7f}-{ca56751f-e643-4bef-bf54-f73678e8b7b5}-3 state=Open
[1.867232] virt_mshv_vtl::processor::mshv::x64::save_restore: ERROR  previous version of underhill did not save startup_suspend state vp_index=0x3 rip=0xffffffffa1d25094 rflags=0x242 cr0=0x80050033 efer=0xd01
[1.867309] vmbus_server::channels::saved_state: INFO  channel restored key={f8615163-df3e-46c5-913f-f2d2f965ed0e}-{f8615163-0000-1000-2000-0204200f3be1}-1 state=Open
[1.867498] vmbus_server::channels::saved_state: INFO  channel restored key={f8615163-df3e-46c5-913f-f2d2f965ed0e}-{f8615163-0000-1000-2000-0204200f3be1}-2 state=Open
[1.867677] vmbus_server::channels::saved_state: INFO  channel restored key={f8615163-df3e-46c5-913f-f2d2f965ed0e}-{f8615163-0000-1000-2000-0204200f3be1}-3 state=Open
[1.867905] virt_mshv_vtl::processor::mshv::x64::save_restore: ERROR  previous version of underhill did not save startup_suspend state vp_index=0x1 rip=0xffffffffa1d25094 rflags=0x246 cr0=0x80050033 efer=0xd01
[1.867936] vmbus_server::channels::saved_state: INFO  channel restored key={44c4f61d-4444-4400-9d52-802e27ede19f}-{bde7fac8-e40a-4399-843a-4fc8196fb662}-0 state=Open
[1.869059] virt_mshv_vtl::processor::mshv::x64::save_restore: ERROR  previous version of underhill did not save startup_suspend state vp_index=0x2 rip=0xffffffffa1d25094 rflags=0x242 cr0=0x80050033 efer=0xd01
[1.870303] vmbus_client::saved_state: INFO  channel restored key={cfa8b69e-5b4a-4cc0-b98b-8ba1a1f3f95a}-{58f75a6d-d949-4320-99e1-a2a2576d581c}-0 state=Opened
[1.870636] vmbus_client::saved_state: INFO  channel restored key={da0a7802-e377-4aac-8e77-0558eb1073f8}-{5620e0c7-8062-4dce-aeb7-520c7ef76171}-0 state=Opened
[1.870887] vmbus_client::saved_state: INFO  channel restored key={525074dc-8985-46e2-8057-a307dc18a502}-{1eccfd72-4b41-45ef-b73a-4a6e44c12924}-0 state=Opened
[1.871200] vmbus_client::saved_state: INFO  channel restored key={a9a0f4e7-5a45-4d96-b827-8a841e8c03e6}-{242ff919-07db-4180-9c2e-b86cb68c8c55}-0 state=Opened
[1.871420] netvsp: ERROR  ERIK restore rss_state: 8 adapter: 32
[1.872245] vmbus_client::saved_state: INFO  channel restored key={f912ad6d-2b17-48ea-bd65-f927a61c7684}-{d34b2567-b9b6-42b9-8778-0a4ec0b955bf}-0 state=Opened
[1.872377] netvsp: WARN  ERIK MissmatchedIndirectionTableSize rss_state: 8 adapter: 32 RESIZING
[1.872580] vmbus_client::saved_state: INFO  channel restored key={9527e630-d0ae-497b-adce-e80ab0175caf}-{2dd1ce17-079e-403c-b352-a1921ee207ee}-0 state=Opened
[1.872839] vmbus_client::saved_state: INFO  channel restored key={57164f39-9115-4e78-ab55-382f3bd5422d}-{fd149e91-82e0-4a7d-afa6-2a4166cbd7c0}-0 state=Opened
[1.873100] vmbus_client::saved_state: INFO  channel restored key={44c4f61d-4444-4400-9d52-802e27ede19f}-{bde7fac8-e40a-4399-843a-4fc8196fb662}-0 state=Opened
[1.873353] vmbus_client::saved_state: INFO  channel restored key={0e0b6031-5213-4934-818b-38d90ced39db}-{b6650ff7-33bc-4840-8048-e0676786f393}-0 state=Offered
[1.873603] vmbus_client::saved_state: INFO  channel restored key={ba6163d9-04a1-4d29-b605-72e2ffb1dc7f}-{a87a3455-592f-4a91-a666-e6e4f9b422c1}-0 state=Opened
[1.878161] vmbus_relay_intercept_device: INFO  matching channel offered offer=OfferInfo { offer: OfferChannel { interface_id: 0e0b6031-5213-4934-818b-38d90ced39db, instance_id: b6650ff7-33bc-4840-8048-e0676786f393, rsvd: [0, 0, 0, 0], flags: OfferFlags { enumerate_device_interface: true, confidential_ring_buffer: false, confidential_external_memory: false, named_pipe_mode: true, tlnpi_provider: false }, mmio_megabytes: 0, user_defined: UserDefinedData([4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), subchannel_index: 0, mmio_megabytes_optional: 0, channel_id: ChannelId(7), monitor_id: ff, monitor_allocated: 0, is_dedicated: 1, connection_id: 2007 }, request_se
[1.878175] vmbus_relay_intercept_device: nd: SenderCore(ManuallyDrop { value: Queue { remote: OnceLock(<uninit>), local: Mutex { data: LocalQueue { messages: ErasedVecDeque { cap: 0, head: 0, len: 0 }, ports: [], waker: None, remote: false, receiver_gone: false, remove_closed: false, new_handler: 0x7f63c013b560 } } } }), response_recv: ReceiverCore { queue: ReceiverQueue(Queue { remote: OnceLock(<uninit>), local: Mutex { data: LocalQueue { messages: ErasedVecDeque { cap: 0, head: 0, len: 0 }, ports: [], waker: None, remote: false, receiver_gone: false, remove_closed: false, new_handler: 0x7f63c013b560 } } }), ports: PortHandlerList([]), terminated: false } }
[1.881567] state_unit: INFO worker_new{ name="UnderhillWorker" action="new"}:init:init/restore{ correlation_id=9d2e061b-aecb-45bb-a904-c2939bce854c}:restore_units:state_change{ operation="restore"}:  state change complete duration=24.682542ms
[1.897072] state_unit: INFO worker_new{ name="UnderhillWorker" action="new"}:init:init/restore{ correlation_id=9d2e061b-aecb-45bb-a904-c2939bce854c}:restore_units:state_change{ operation="post_restore"}:  state change complete duration=14.103998ms
[1.899309] underhill_core: INFO  vm worker started
[1.901515] netvsp: ERROR  ERIK Coordinator::restart_queues enter
[1.915329] state_unit: INFO state_change{ operation="start"}:  state change complete duration=16.585312ms
[1.915579] underhill_core::dispatch: INFO  resuming VM correlation_id=9d2e061b-aecb-45bb-a904-c2939bce854c blackout_time_ms=0x99c blackout_time="2.4609908s"
[1.921186] netvsp: ERROR  ERIK Coordinator::restart_queues DONE RSS State Some indirection table len: 32
[1.921567] vmbus_client: INFO  opening channel on host channel_id=0x7
[1.923202] hyperv_ic_guest::shutdown: INFO  version negotiated framework_version=3.0 message_version=3.2
[1.930100] netvsp: INFO  Query data path state is_data_path_switched=true
[1.945662] mana_driver::gdma_driver: INFO  retargeting EQ 0 to cpu: 2
[1.945959] underhill_core::emuplat::netvsp: INFO  Adding VF to VTL0 vfid=0xbde7fac8
[1.963619] mana_driver::gdma_driver: INFO  retargeting EQ 1 to cpu: 3
[1.964610] mana_driver::gdma_driver: INFO  retargeting EQ 2 to cpu: 0
[1.980813] mana_driver::gdma_driver: INFO  retargeting EQ 3 to cpu: 1

Kmsg logs testing a resize from 32 to 8

[1.546240] netvsp: ERROR worker_new{ name="UnderhillWorker" action="new"}:init:init/new_underhill_vm{ correlation_id=aa32dd7e-b54a-4130-9816-b21b1e9bf358}:  ERIK Arc::new Adapter multiqueue.indirection_table_size 8 count 24 even true table_size 8
...
[1.809862] netvsp: ERROR  ERIK restore rss_state: 32 adapter: 8
[1.810054] netvsp: WARN  ERIK MissmatchedIndirectionTableSize rss_state: 32 adapter: 8 RESIZING
[1.812162] virt_mshv_vtl::processor::mshv::x64::save_restore: ERROR  previous version of underhill did not save startup_suspend state vp_index=0x2 rip=0xffffffffa1d25094 rflags=0x252 cr0=0x80050033 efer=0xd01
[1.812688] virt_mshv_vtl::processor::mshv::x64::save_restore: ERROR  previous version of underhill did not save startup_suspend state vp_index=0x1 rip=0xffffffffa1d25094 rflags=0x242 cr0=0x80050033 efer=0xd01
[1.817467] virt_mshv_vtl::processor::mshv::x64::save_restore: ERROR  previous version of underhill did not save startup_suspend state vp_index=0x3 rip=0xffffffffa1d25094 rflags=0x256 cr0=0x80050033 efer=0xd01
[1.818014] state_unit: INFO worker_new{ name="UnderhillWorker" action="new"}:init:init/restore{ correlation_id=aa32dd7e-b54a-4130-9816-b21b1e9bf358}:restore_units:state_change{ operation="restore"}:  state change complete duration=27.219952ms
[1.825639] state_unit: INFO worker_new{ name="UnderhillWorker" action="new"}:init:init/restore{ correlation_id=aa32dd7e-b54a-4130-9816-b21b1e9bf358}:restore_units:state_change{ operation="post_restore"}:  state change complete duration=7.192347ms
[1.830249] underhill_core: INFO  vm worker started
[1.831750] vmbus_client: INFO  opening channel on host channel_id=0x7
[1.834195] hyperv_ic_guest::shutdown: INFO  version negotiated framework_version=3.0 message_version=3.2
[1.834642] netvsp: ERROR  ERIK Coordinator::restart_queues enter
[1.842460] netvsp: ERROR  ERIK Coordinator::restart_queues DONE RSS State Some indirection table len: 8
[1.845449] state_unit: INFO state_change{ operation="start"}:  state change complete duration=18.389969ms
[1.846135] underhill_core::dispatch: INFO  resuming VM correlation_id=aa32dd7e-b54a-4130-9816-b21b1e9bf358 blackout_time_ms=0x92f blackout_time="2.3514541s"
[1.853741] netvsp: INFO  Query data path state is_data_path_switched=true
[1.859030] mana_driver::gdma_driver: INFO  retargeting EQ 0 to cpu: 2
[1.859241] underhill_core::emuplat::netvsp: INFO  Adding VF to VTL0 vfid=0xbde7fac8
[1.862926] mana_driver::gdma_driver: INFO  retargeting EQ 1 to cpu: 3
[1.863241] mana_driver::gdma_driver: INFO  retargeting EQ 2 to cpu: 0
[1.866643] mana_driver::gdma_driver: INFO  retargeting EQ 3 to cpu: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant