Skip to content

CP-53711: Copy SSH settings from pool coordinator in pool join #6395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 51 additions & 2 deletions ocaml/idl/datamodel_host.ml
Original file line number Diff line number Diff line change
Expand Up @@ -1297,14 +1297,63 @@ let create_params =
; param_doc=
"The SHA256 checksum of updateinfo of the most recently applied update \
on the host"
; param_release= numbered_release "24.39.0-next"
; param_release= numbered_release "24.40.0"
; param_default= Some (VString "")
}
; {
param_type= Bool
; param_name= "ssh_enabled"
; param_doc= "True if SSH access is enabled for the host"
; param_release= numbered_release "25.14.0-next"
; param_default= Some (VBool true)
}
; {
param_type= Int
; param_name= "ssh_enabled_timeout"
; param_doc=
"The timeout in seconds after which SSH access will be automatically \
disabled (0 means never), this setting will be applied every time the \
SSH is enabled by XAPI"
; param_release= numbered_release "25.14.0-next"
; param_default= Some (VInt 0L)
}
; {
param_type= DateTime
; param_name= "ssh_expiry"
; param_doc=
"The time in UTC after which the SSH access will be automatically \
disabled"
; param_release= numbered_release "25.14.0-next"
; param_default= Some (VDateTime Date.epoch)
}
; {
param_type= Int
; param_name= "console_idle_timeout"
; param_doc=
"The timeout in seconds after which idle console will be automatically \
terminated (0 means never)"
; param_release= numbered_release "25.14.0-next"
; param_default= Some (VInt 0L)
}
]

let create =
call ~name:"create" ~in_oss_since:None
~lifecycle:[(Published, rel_rio, "Create a new host record")]
~lifecycle:
[
(Published, rel_rio, "Create a new host record")
; ( Changed
, "24.40.0"
, "Added --last_update_hash option to allow last_update_hash to be \
kept for host joined a pool"
)
; ( Changed
, "25.14.0-next"
, "Added --ssh_enabled --ssh_enabled_timeout --ssh_expiry \
--console_idle_timeout options to allow them to be configured for \
new host"
)
]
~versioned_params:create_params ~doc:"Create a new host record"
~result:(Ref _host, "Reference to the newly created host object.")
~hide_from_docs:true ~allowed_roles:_R_POOL_OP ()
Expand Down
14 changes: 11 additions & 3 deletions ocaml/idl/datamodel_pool.ml
Original file line number Diff line number Diff line change
Expand Up @@ -1249,7 +1249,15 @@ let remove_repository =

let sync_updates =
call ~name:"sync_updates"
~lifecycle:[(Published, "1.329.0", "")]
~lifecycle:
[
(Published, "1.329.0", "")
; ( Changed
, "25.7.0"
, "Added --username --password options to allow syncing updates from a \
remote_pool type repository"
)
]
~doc:"Sync with the enabled repository"
~versioned_params:
[
Expand Down Expand Up @@ -1286,14 +1294,14 @@ let sync_updates =
param_type= String
; param_name= "username"
; param_doc= "The username of the remote pool"
; param_release= numbered_release "25.6.0-next"
; param_release= numbered_release "25.7.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update these release versions when the feature branch is ready to be merged into the master branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fixup for merged code, not from the feature branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's easy to ignore changing the next to release version. Please remember do it when merging to master. BTW, there is gen_lifecycle to check datamodel_lifecycle.ml, see here. I think similar methods can be used for datamodel_pool and datamodel_host too. It can be considered in the future.

; param_default= Some (VString "")
}
; {
param_type= String
; param_name= "password"
; param_doc= "The password of the remote pool"
; param_release= numbered_release "25.6.0-next"
; param_release= numbered_release "25.7.0"
; param_default= Some (VString "")
}
]
Expand Down
7 changes: 5 additions & 2 deletions ocaml/tests/common/test_common.ml
Original file line number Diff line number Diff line change
Expand Up @@ -170,13 +170,16 @@ let make_host ~__context ?(uuid = make_uuid ()) ?(name_label = "host")
?(external_auth_service_name = "") ?(external_auth_configuration = [])
?(license_params = []) ?(edition = "free") ?(license_server = [])
?(local_cache_sr = Ref.null) ?(chipset_info = []) ?(ssl_legacy = false)
?(last_software_update = Date.epoch) ?(last_update_hash = "") () =
?(last_software_update = Date.epoch) ?(last_update_hash = "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add these parameters to make_host? I don't see them been used.

Copy link
Contributor Author

@gangj gangj Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used below: in line 181 and 182.

?(ssh_enabled = true) ?(ssh_enabled_timeout = 0L) ?(ssh_expiry = Date.epoch)
?(console_idle_timeout = 0L) () =
let host =
Xapi_host.create ~__context ~uuid ~name_label ~name_description ~hostname
~address ~external_auth_type ~external_auth_service_name
~external_auth_configuration ~license_params ~edition ~license_server
~local_cache_sr ~chipset_info ~ssl_legacy ~last_software_update
~last_update_hash
~last_update_hash ~ssh_enabled ~ssh_enabled_timeout ~ssh_expiry
~console_idle_timeout
in
Db.Host.set_cpu_info ~__context ~self:host ~value:default_cpu_info ;
host
Expand Down
2 changes: 2 additions & 0 deletions ocaml/tests/test_host.ml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ let add_host __context name =
~license_params:[] ~edition:"" ~license_server:[]
~local_cache_sr:Ref.null ~chipset_info:[] ~ssl_legacy:false
~last_software_update:Clock.Date.epoch ~last_update_hash:""
~ssh_enabled:true ~ssh_enabled_timeout:0L ~ssh_expiry:Clock.Date.epoch
~console_idle_timeout:0L
)

(* Creates an unlicensed pool with the maximum number of hosts *)
Expand Down
3 changes: 2 additions & 1 deletion ocaml/xapi/dbsync_slave.ml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ let create_localhost ~__context info =
~external_auth_configuration:[] ~license_params:[] ~edition:""
~license_server:[("address", "localhost"); ("port", "27000")]
~local_cache_sr:Ref.null ~chipset_info:[] ~ssl_legacy:false
~last_software_update:Date.epoch ~last_update_hash:""
~last_software_update:Date.epoch ~last_update_hash:"" ~ssh_enabled:true
~ssh_enabled_timeout:0L ~ssh_expiry:Date.epoch ~console_idle_timeout:0L
in
()

Expand Down
8 changes: 4 additions & 4 deletions ocaml/xapi/xapi_host.ml
Original file line number Diff line number Diff line change
Expand Up @@ -978,7 +978,8 @@ let is_host_alive ~__context ~host =
let create ~__context ~uuid ~name_label ~name_description:_ ~hostname ~address
~external_auth_type ~external_auth_service_name ~external_auth_configuration
~license_params ~edition ~license_server ~local_cache_sr ~chipset_info
~ssl_legacy:_ ~last_software_update ~last_update_hash =
~ssl_legacy:_ ~last_software_update ~last_update_hash ~ssh_enabled
~ssh_enabled_timeout ~ssh_expiry ~console_idle_timeout =
(* fail-safe. We already test this on the joining host, but it's racy, so multiple concurrent
pool-join might succeed. Note: we do it in this order to avoid a problem checking restrictions during
the initial setup of the database *)
Expand Down Expand Up @@ -1042,9 +1043,8 @@ let create ~__context ~uuid ~name_label ~name_description:_ ~hostname ~address
~multipathing:false ~uefi_certificates:"" ~editions:[] ~pending_guidances:[]
~tls_verification_enabled ~last_software_update ~last_update_hash
~recommended_guidances:[] ~latest_synced_updates_applied:`unknown
~pending_guidances_recommended:[] ~pending_guidances_full:[]
~ssh_enabled:true ~ssh_enabled_timeout:0L ~ssh_expiry:Date.epoch
~console_idle_timeout:0L ;
~pending_guidances_recommended:[] ~pending_guidances_full:[] ~ssh_enabled
~ssh_enabled_timeout ~ssh_expiry ~console_idle_timeout ;
(* If the host we're creating is us, make sure its set to live *)
Db.Host_metrics.set_last_updated ~__context ~self:metrics ~value:(Date.now ()) ;
Db.Host_metrics.set_live ~__context ~self:metrics ~value:host_is_us ;
Expand Down
4 changes: 4 additions & 0 deletions ocaml/xapi/xapi_host.mli
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,10 @@ val create :
-> ssl_legacy:bool
-> last_software_update:API.datetime
-> last_update_hash:string
-> ssh_enabled:bool
-> ssh_enabled_timeout:int64
-> ssh_expiry:API.datetime
-> console_idle_timeout:int64
-> [`host] Ref.t

val destroy : __context:Context.t -> self:API.ref_host -> unit
Expand Down
35 changes: 34 additions & 1 deletion ocaml/xapi/xapi_pool.ml
Original file line number Diff line number Diff line change
Expand Up @@ -943,6 +943,38 @@ let rec create_or_get_host_on_master __context rpc session_id (host_ref, host) :
create_or_get_sr_on_master __context rpc session_id
(my_local_cache_sr, my_local_cache_sr_rec)
in
let remote_coordinator = get_master ~rpc ~session_id in
let ssh_enabled =
Client.Host.get_ssh_enabled ~rpc ~session_id ~self:remote_coordinator
in
let ssh_enabled_timeout =
Client.Host.get_ssh_enabled_timeout ~rpc ~session_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Client.Host.get_ssh_enabled_timeout send API call to the remote coordinator? If so, it will send 4 times API calls. Could it just send one time API call to fetch all these 4 parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't do that without querying all the records. Or how?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could Client.Host.get_record work?

Copy link
Contributor Author

@gangj gangj Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will query all the records of the host, I think it is expensive.

~self:remote_coordinator
in
let console_idle_timeout =
Client.Host.get_console_idle_timeout ~rpc ~session_id
~self:remote_coordinator
in
(* Configure SSH service on local host *)
Xapi_host.set_console_idle_timeout ~__context ~self:host_ref
~value:console_idle_timeout ;
Xapi_host.set_ssh_enabled_timeout ~__context ~self:host_ref
~value:ssh_enabled_timeout ;
( match ssh_enabled with
| true ->
Xapi_host.enable_ssh ~__context ~self:host_ref
| false ->
Xapi_host.disable_ssh ~__context ~self:host_ref
) ;
Comment on lines +963 to +968
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really belong here in this create_or_get_host_on_master function, where the purpose is to create a host record for joining host on the in pool's DB. It would be better to factor out these additional side effects to keep the logic clean(er).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From L958 actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(* As ssh_expiry will be updated by host.enable_ssh and host.disable_ssh,
there is a corner case when the joiner's SSH state will not match SSH
service state in its new coordinator exactly: if the joiner joins when
SSH service has been enabled in the new coordinator, while not timed
out yet, the joiner will start SSH service with timeout
host.ssh_enabled_timeout, which means SSH service in the joiner will
be disabled later than in the new coordinator. *)
let ssh_expiry = Db.Host.get_ssh_expiry ~__context ~self:host_ref in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The joining host is going through a reboot. What is the general policy for SSH over a reboot? If SSH was enabled before reboot, what is the state after reboot?

Copy link
Contributor Author

@gangj gangj Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The joining host will not reboot, only xapi will have a restart to finish the joining.
While a host will go through a reboot when it is ejected from a pool. And I think in current design, SSH service status will be kept after reboot:

  1. If SSH service is enabled without timeout or disabled, it will not change after reboot.
  2. If SSH service is being enabled with a timeout, after reboot, it will be enabled with remaining timeout, and be disabled after time is up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the current policy is to sync the same status, including ssh_expiry, from the pool coordinator to the newly joined host. It relies on the reboot mechanism to ensure SSH is disabled eventually. Essentially, the newly joined host will follow the same process as the pool coordinator: reboot, check if the expiry time is greater than the current time, and trigger a disable.

Copy link
Contributor Author

@gangj gangj Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the current policy is to sync the same status, including ssh_expiry, from the pool coordinator to the newly joined host.

No, ssh_expiry will be now + ssh_enabled_timeout for the new joined host as we discussed offline, pls check the code.


debug "Creating host object on master" ;
let ref =
Client.Host.create ~rpc ~session_id ~uuid:my_uuid
Expand All @@ -962,7 +994,8 @@ let rec create_or_get_host_on_master __context rpc session_id (host_ref, host) :
~local_cache_sr ~chipset_info:host.API.host_chipset_info
~ssl_legacy:false
~last_software_update:host.API.host_last_software_update
~last_update_hash:host.API.host_last_update_hash
~last_update_hash:host.API.host_last_update_hash ~ssh_enabled
~ssh_enabled_timeout ~ssh_expiry ~console_idle_timeout
in
(* Copy other-config into newly created host record: *)
no_exn
Expand Down
Loading