Skip to content

CP-44103: Ordering network devices #6381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

minglumlu
Copy link
Member

@minglumlu minglumlu commented Mar 21, 2025

This part sorts host network devices in xcp-networkd. Previously, the
ordering was handled by the interface-rename functionality. This will
now be replaced by an equivalent function in xcp-networkd, but without
renaming the network devices. The renmaing performed by the
interface-rename was used to record the sorting result as the name of
the NICs like "eth". Now the sorting result will be recorded as
internal files.

@minglumlu minglumlu force-pushed the private/mingl/CP-44103 branch from b8aa61c to 1d1da55 Compare March 21, 2025 12:08
@minglumlu minglumlu changed the base branch from master to feature/host-network-device-ordering March 21, 2025 12:13
@minglumlu minglumlu force-pushed the private/mingl/CP-44103 branch from 1d1da55 to 898b31d Compare March 21, 2025 12:27
@psafont
Copy link
Member

psafont commented Mar 21, 2025

The code is non-trivial, with lots of complexity. I think a design of the algorithm that can rationalize the different uses-cases is needed before accepting the code

@minglumlu
Copy link
Member Author

minglumlu commented Mar 25, 2025

The code is non-trivial, with lots of complexity. I think a design of the algorithm that can rationalize the different uses-cases is needed before accepting the code

A separate PR #6387 is raised for this.


open D

let last_file_path = "/var/lib/xcp/last_ordered_network_devices.json"
Copy link
Contributor

@lindig lindig Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have all information in a single file? I would expect that we save the current order plus ethX slots that were previously used but are currently empty. Why do we need two files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a separate conversation I have suggested extending the existing config_t record with extra fields for these new lists. This is serialised as JSON to the file networkd.db along with other networkd state.


let index_of_string ty value =
match ty with
| ty when ty = "pci" ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings can be matched directly without when

@lindig
Copy link
Contributor

lindig commented Mar 27, 2025

The list of interfaces per host is short. So I am not sure we need sophisticated data structures if we have lists with 10 elements at most. I also believe, the fundamental function that assigns NICs to ETH slots should be short and take in: the list of NICs and the previous assignment and create a new assignment (which then is persisted to a file). I do think it is a good idea to have modules for the different objects involved, like PCI slots, NICs, and MACs. So I believe less code could solve the task.

@minglumlu minglumlu force-pushed the private/mingl/CP-44103 branch 2 times, most recently from 8283b5a to 79254e5 Compare April 14, 2025 04:55
@minglumlu minglumlu marked this pull request as ready for review April 14, 2025 05:52
@minglumlu minglumlu force-pushed the private/mingl/CP-44103 branch 3 times, most recently from 77284fd to bcb1845 Compare April 14, 2025 08:24
(** Type of one mapping configuration. *)
type t = {position: int; index: index}

val rules_of_file : path:string -> t list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler name like "read" (because this is inside the Rule module already would work, too.

(** [Duplicate_position] is raised when duplicate position is specified in the rules. *)
exception Duplicate_position

val matched : mac:Macaddr.t -> pci:Pciaddr.t -> label:string -> t -> bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "matches" work better purely from a grammatical point of view?

; mac: Network_interface.mac_address
; pci: Xcp_pci.address
; bios_eth_order: int
; multinic: bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest multi_nic

name: Network_interface.iface
; mac: Network_interface.mac_address
; pci: Xcp_pci.address
; bios_eth_order: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coud this be negative? Otherwise add a comment that documents the range. Is this zero- or one-based?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a doc for this field.

exception Duplicate_mac_address
end

val generate : OrderedDev.t list -> OrderedDev.t list * (string * string) list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest "order" or "sort" as a name.

(acc_ordered, List.rev_append unordered_devs acc_unordered)
)
(ListToPciaddrMap.to_1n_map ~by:(fun v -> v.Dev.pci) multinics)
([], [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a little complicated. Try to understand:
The function handles the remaining devices that are multi-nics after assign_position_by_rules, mac and pci
s1) Totally, these devices need to rearrange the N of ethN based on mac addresses because biosdevname's outcome is not stable for the multi-nics(see doc) and go to assign_position_for_remaining.
s2) There is a special scenario to handle: all mac addresses of multinics are changed, then assign the last positions to them. All other secanrios follow s1.
Notice that assign_position_by_pci skips the multinics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all MAC addresses change, they will be considered as new devices.

Alcotest.(check bool) "is Ok" true (Result.is_ok order) ;
let order = Result.get_ok order in
Alcotest.(check int) "6 devices in the order" 6 (List.length order) ;
Alcotest.(check int) "position assigned" 5 (pos_of_mac mac_addr0 order) ;
Copy link
Member

@psafont psafont Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like all this hardcoding, This means that the order is in two places now (the source of truth, rules in this case), and in the alcotest checks. Not only this is confusing, but it allows for errors

I'm preparing a branch with a more parametrized way of writing the tests.

It looks like this:

let test_default () =
  let order =
    sort' ~currents:current_slots ~rules:[] ~last_order:[] |> Result.get_ok
  in
  let test_position_and_presence mac =
    test_position_and_presence (expected_of_dev_mac current_slots mac) order mac
  in

  Alcotest.(check int) "6 devices in the order" 6 (List.length order) ;
  test_position_and_presence mac_addr0 ;
  test_position_and_presence mac_addr1 ;
  test_position_and_presence mac_addr2 ;
  test_position_and_presence mac_addr3 ;
  test_position_and_presence mac_addr4 ;
  test_position_and_presence mac_addr5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I updated the testing code based on the suggestions.

name= "eno6"
; pci= pci_addr6
; mac= mac_addr6
; bios_eth_order= 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with currents !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to test new adding devices (first seen). They are expected to be sorted by bios_eth_order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that currents contains dev1 which has bios_eth_order= 1. This means that when concatenating the values to form a new currents, there are two interfaces in the same order / slot. I thought this went against what biosdevname does. (we could add a unit-test ensuring this as well)

Copy link
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tests need to improve, they have duplication, which lead to mistakes and some scenarios are not clear to me. I've prepared a branch with suggestions which include comments marked with XXX that should be resolved / explained.

https://github.com/psafont/xen-api/pull/new/private/mingl/CP-44103

@minglumlu minglumlu force-pushed the private/mingl/CP-44103 branch 2 times, most recently from cfc0849 to 77e417f Compare April 17, 2025 11:49
; pci= pci_addr4
; mac= mac_addr4
; bios_eth_order= 4
; multi_nic= true (* multinic: share PCI address with dev4 *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; multi_nic= true (* multinic: share PCI address with dev4 *)
; multi_nic= true (* multinic: share PCI address with dev5 *)

Copy link
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit-tests now look better, I'm unblocking the PR

@@ -0,0 +1,526 @@
(** Generete an order for host network devices and keep the order as stable as possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Generate"

Also, we need copyright header on top of the new files.

@minglumlu
Copy link
Member Author

Squashing the fixup commits ...

minglumlu added 3 commits May 6, 2025 11:16
This part sorts host network devices in xcp-networkd. Previously, the
ordering was handled by the interface-rename functionality. This will
now be replaced by an equivalent function in xcp-networkd, but without
renaming the network devices. The renmaing performed by the
interface-rename was used to record the sorting result as the name of
the NICs like "eth<N>". Now the sorting result will be saved in
xcp-networkd database.

Signed-off-by: Ming Lu <[email protected]>
The "test_network_device_order_inherited.ml" is inherited from the
interface-rename functionality.

Signed-off-by: Ming Lu <[email protected]>
@minglumlu minglumlu force-pushed the private/mingl/CP-44103 branch from 9ef757f to bdb023a Compare May 6, 2025 03:16
@minglumlu minglumlu merged commit c2756c6 into xapi-project:feature/host-network-device-ordering May 6, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants