Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement secure node join flow #924

Merged
merged 1 commit into from
Feb 14, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
feat: implement secure node join flow
Fixes: #840

This PR changes the Talos machine join flow drastically:

- newly joined machine first put into a limbo state where Omni creates a
  temporary Wireguard connection to it.
- the controller picks up and tries to write a unique machine token to
  the newly joined machine, in the mean time it also resolves UUID
  conflicts automatically and writes UUID override to the META
  partition.
- the machine re-joins Omni, now with the unique token.
- the unique token is saved in the `siderolink.Link` resource and any
  subsequent join checks that `siderolink.Link` has matching unique
  token.

Siderolink manager was refactored, as it was a huge monolithic poorly
testable chunk, it was split to:
- LinkStatus controller, which creates/removes wireguard peers.
- PendingMachineStatus controller, which ensures all joined machines
  have unique node tokens.
- Provision handler, which implements gRPC server and has all logic
  related to the machine acceptance now.
- PeersPool, which is used by LinkStatus controllers and deduplicate
  peers creation, reuse them when possible.

Additionally updated siderolink loghandler to not accept logger
connection for the machines which do not have corresponding log buffers.

Nodes which do not support secure flow are still able to join by
default.
Secure join flow can be forced by setting `--disable-legacy-join-tokens`
flag.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Unix4ever committed Feb 14, 2025

Verified

This commit was signed with the committer’s verified signature.
Unix4ever Artem Chernyshev
commit 9bb85f80344b47d82cb0f1458fa4257711ffeefb
242 changes: 193 additions & 49 deletions client/api/omni/specs/siderolink.pb.go

Large diffs are not rendered by default.

18 changes: 18 additions & 0 deletions client/api/omni/specs/siderolink.proto
Original file line number Diff line number Diff line change
@@ -28,6 +28,19 @@ message SiderolinkSpec {
// RemoteAddr is the machine address how it's visible from Omni
// it is determined by reading X-Forwarded-For header coming from the gRPC API.
string remote_addr = 8;
// NodeUniqueToken is the per node join token which is saved in the Node META partition after
// the machine is accepted in Omni.
// Only for Talos >= 1.6.
string node_unique_token = 9;
}

// LinkStatusSpec is created when the link peer event was submitted.
message LinkStatusSpec {
string node_subnet = 1;
string node_public_key = 2;
string virtual_addrport = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why virtual?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's for the wireguard over gRPC. We keep it there to keep track of it being updated.

// LinkId is the ID of the resource which created the link status.
string link_id = 4;
}

// SiderolinkConnectionSpec describes each node connection information.
@@ -55,3 +68,8 @@ message ConnectionParamsSpec {
// LogsPort is the logs port.
int32 logs_port = 10;
}

// PendingMachineStatusSpec describes the spec of the pending machine status resource.
message PendingMachineStatusSpec {
string token = 1;
}
Loading