Skip to content

CP-53711: Copy SSH settings from pool coordinator in pool join #6395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gangj
Copy link
Contributor

@gangj gangj commented Apr 1, 2025

No description provided.

@@ -1286,14 +1294,14 @@ let sync_updates =
param_type= String
; param_name= "username"
; param_doc= "The username of the remote pool"
; param_release= numbered_release "25.6.0-next"
; param_release= numbered_release "25.7.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update these release versions when the feature branch is ready to be merged into the master branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fixup for merged code, not from the feature branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's easy to ignore changing the next to release version. Please remember do it when merging to master. BTW, there is gen_lifecycle to check datamodel_lifecycle.ml, see here. I think similar methods can be used for datamodel_pool and datamodel_host too. It can be considered in the future.

@BengangY
Copy link
Contributor

BengangY commented Apr 1, 2025

"...will be dropped sson."
A spelling mistake in the description of the second commit.

@gangj gangj force-pushed the private/gangj/CP-53711 branch from bbfe5a4 to 01fe4e3 Compare April 1, 2025 04:36
Client.Host.get_console_idle_timeout ~rpc ~session_id
~self:remote_coordinator
in
(* Configure SSH service parameters in local DB to setup local SSH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to set up" ("setup" is a noun but you need a verb here)

out yet, the joiner will start SSH service with timeout
host.ssh_enabled_timeout, which means SSH service in the joiner will
be disabled later than in the new coordinator. *)
let ssh_expiry = Db.Host.get_ssh_expiry ~__context ~self:host_ref in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The joining host is going through a reboot. What is the general policy for SSH over a reboot? If SSH was enabled before reboot, what is the state after reboot?

Copy link
Contributor Author

@gangj gangj Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The joining host will not reboot, only xapi will have a restart to finish the joining.
While a host will go through a reboot when it is ejected from a pool. And I think in current design, SSH service status will be kept after reboot:

  1. If SSH service is enabled without timeout or disabled, it will not change after reboot.
  2. If SSH service is being enabled with a timeout, after reboot, it will be enabled with remaining timeout, and be disabled after time is up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the current policy is to sync the same status, including ssh_expiry, from the pool coordinator to the newly joined host. It relies on the reboot mechanism to ensure SSH is disabled eventually. Essentially, the newly joined host will follow the same process as the pool coordinator: reboot, check if the expiry time is greater than the current time, and trigger a disable.

Copy link
Contributor Author

@gangj gangj Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the current policy is to sync the same status, including ssh_expiry, from the pool coordinator to the newly joined host.

No, ssh_expiry will be now + ssh_enabled_timeout for the new joined host as we discussed offline, pls check the code.

Add "Changed" records for 2 APIs which were missed.

Fix "param_release" for 3 added parameters.

Signed-off-by: Gang Ji <[email protected]>
@gangj gangj force-pushed the private/gangj/CP-53711 branch from 01fe4e3 to bd6c586 Compare April 2, 2025 03:33
@gangj
Copy link
Contributor Author

gangj commented Apr 2, 2025

Force push to re-base to latest feature branch, which merged the latest master to fix the build failure.

@gangj gangj force-pushed the private/gangj/CP-53711 branch from bd6c586 to a09a2ed Compare April 2, 2025 03:43
@@ -1286,14 +1294,14 @@ let sync_updates =
param_type= String
; param_name= "username"
; param_doc= "The username of the remote pool"
; param_release= numbered_release "25.6.0-next"
; param_release= numbered_release "25.7.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's easy to ignore changing the next to release version. Please remember do it when merging to master. BTW, there is gen_lifecycle to check datamodel_lifecycle.ml, see here. I think similar methods can be used for datamodel_pool and datamodel_host too. It can be considered in the future.

@@ -170,13 +170,16 @@ let make_host ~__context ?(uuid = make_uuid ()) ?(name_label = "host")
?(external_auth_service_name = "") ?(external_auth_configuration = [])
?(license_params = []) ?(edition = "free") ?(license_server = [])
?(local_cache_sr = Ref.null) ?(chipset_info = []) ?(ssl_legacy = false)
?(last_software_update = Date.epoch) ?(last_update_hash = "") () =
?(last_software_update = Date.epoch) ?(last_update_hash = "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add these parameters to make_host? I don't see them been used.

Copy link
Contributor Author

@gangj gangj Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used below: in line 181 and 182.

Db.Host.set_ssh_enabled ~__context ~self:host_ref ~value:ssh_enabled ;
Db.Host.set_ssh_enabled_timeout ~__context ~self:host_ref
~value:ssh_enabled_timeout ;
Db.Host.set_console_idle_timeout ~__context ~self:host_ref
Copy link
Contributor

@BengangY BengangY Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_console_idle_timeout also updates /root/.bashrc. Should here call the API host.set_console_idle_timeout instead of updating DB directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is clear with API call, changed to that.

Client.Host.get_ssh_enabled ~rpc ~session_id ~self:remote_coordinator
in
let ssh_enabled_timeout =
Client.Host.get_ssh_enabled_timeout ~rpc ~session_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Client.Host.get_ssh_enabled_timeout send API call to the remote coordinator? If so, it will send 4 times API calls. Could it just send one time API call to fetch all these 4 parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't do that without querying all the records. Or how?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could Client.Host.get_record work?

Copy link
Contributor Author

@gangj gangj Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will query all the records of the host, I think it is expensive.

@gangj gangj force-pushed the private/gangj/CP-53711 branch from a09a2ed to 30bd072 Compare April 2, 2025 09:18
During pool join, create a new host obj in the remote pool coordinator
DB with the same SSH settings as pool coordinator.

Also configure SSH service locally before xapi restart which will
persist after xapi restart.

Signed-off-by: Gang Ji <[email protected]>
@gangj gangj force-pushed the private/gangj/CP-53711 branch from 30bd072 to a875364 Compare April 2, 2025 09:21
@gangj gangj merged commit 6e6c0ed into xapi-project:feature/configure-ssh-phase2 Apr 2, 2025
17 checks passed
Comment on lines +963 to +968
( match ssh_enabled with
| true ->
Xapi_host.enable_ssh ~__context ~self:host_ref
| false ->
Xapi_host.disable_ssh ~__context ~self:host_ref
) ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really belong here in this create_or_get_host_on_master function, where the purpose is to create a host record for joining host on the in pool's DB. It would be better to factor out these additional side effects to keep the logic clean(er).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From L958 actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants