Skip to content

CKS Enhancements #9102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 103 commits into
base: main
Choose a base branch
from
Open

Conversation

nvazquez
Copy link
Contributor

@nvazquez nvazquez commented May 21, 2024

Description

Design Document: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CKS+Enhancements

Documentation PR: apache/cloudstack-documentation#458

This PR extends the CloudStack Kubernetes Service functionalities, matching these requirements:

  • Ability to specify different compute or service offerings for different types of CKS cluster nodes – worker, master or etcd: The createKubernetesCluster API and the corresponding UI must provide an option to provide different offering for different types of nodes. CKS compute offerings will be marked as CKS compatible.
  • Ability to use CKS ready custom templates for CKS cluster nodes: CKS will allow users to specify their own templates for different CKS node types (control and worker) at the point of cluster creation. Those templates will be marked as CKS compatible.
  • Ability to use generic (non CKS ready) custom templates for CKS cluster nodes: CKS will allow users to specify their own templates for different CKS node types (control and worker) at the point of cluster creation. Those templates will be marked as CKS compatible. The user will be responsible for installing all necessary packages in the template.
  • Ability to add and remove a pre-created instance as a worker node to an existing CKS cluster: An instance (either virtual of physical) which has been built and prepared for CKS can been added to the desired CKS cluster. The instance must have all the CKS worker node packages installed.
  • Ability to separate etcd from master nodes of the CKS cluster: End users should be provided with an option to separate etcd cluster at the time of CKS cluster creation. The user can enable such option in the UI or in the createKubernetesCluster API and specify the size of the etcd cluster. Based on the user inputs CloudStack should be able to provision such etcd nodes for the CKS cluster.
  • Ability to mark CKS cluster nodes for manual only upgrade: An end user should be able to mark the desired compute offering (or the CKS template) for manual upgrades only. CKS cluster nodes marked for manual upgrade should be untouched during the Kubernetes version upgrade when executed using upgradeKubernetesCluster API.
  • Ability to dedicate specific hosts/clusters to a specific domain for CKS cluster deployment: The dedicateHost/dedicateCluster APIs can be used to provide this functionality to dedicate hosts/clusters for CKS cluster deployments. During the deployment of CKS cluster node VMs they will by default be deployed in the dedicated cluster.
  • Methodology for AS number management: Operators should be able to assign a range of AS numbers to an ACS Zone. ACS must have a method to assign an AS number to each Isolated network (or VPC tier), which can be retrieved via the UI and API. (Introduced on PR New feature: Dynamic and Static Routing #9470)
  • Methodology to use diverse CNI plugins (Calico, Cilium, etc…): End users should be able to deploy CKS clusters with Calico CNI. An option to specify which CNI plugin to be used for a CKS cluster must be provided in the createKubernetesClusterCmd API. The CNI configuration and setup can be registered as a managed userdata, and any configurable parameters – here, AS number, BGP Peer AS number and IP address, can be defined as variables in the userdata be set during the creation of the CKS cluster. This provides a flexible way for users to use the CNI plugin of their choice.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

In testing on vCenter 7.0 environment + NSX SDN

How did you try to break this feature and the system with this change?

nvazquez and others added 8 commits May 21, 2024 12:40
* Ability to specify different compute or service offerings for different types of CKS cluster nodes – worker, master or etcd

* Ability to use CKS ready custom templates for CKS cluster nodes

---------

Co-authored-by: Pearl Dsilva <[email protected]>
… a kubernetes cluster

---------

Co-authored-by: nvazquez <[email protected]>
* CKS: Fix ISO attach logic

* address comment
@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 9649

Copy link

codecov bot commented May 22, 2024

Codecov Report

Attention: Patch coverage is 10.64855% with 2039 lines in your changes missing coverage. Please review.

Project coverage is 16.55%. Comparing base (7632814) to head (0be2cd5).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...bernetes/cluster/KubernetesClusterManagerImpl.java 11.95% 320 Missing and 4 partials ⚠️
...r/actionworkers/KubernetesClusterActionWorker.java 1.14% 260 Missing ⚠️
...er/actionworkers/KubernetesClusterStartWorker.java 0.00% 247 Missing ⚠️
...ster/actionworkers/KubernetesClusterAddWorker.java 0.00% 211 Missing ⚠️
...KubernetesClusterResourceModifierActionWorker.java 0.00% 119 Missing ⚠️
...r/actionworkers/KubernetesClusterRemoveWorker.java 0.00% 106 Missing ⚠️
...er/actionworkers/KubernetesClusterScaleWorker.java 28.14% 92 Missing and 5 partials ⚠️
...ava/com/cloud/upgrade/dao/Upgrade42010to42100.java 26.25% 57 Missing and 2 partials ⚠️
.../cloud/kubernetes/cluster/KubernetesClusterVO.java 0.00% 56 Missing ⚠️
...dstack/api/response/KubernetesClusterResponse.java 0.00% 53 Missing ⚠️
... and 51 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #9102      +/-   ##
============================================
- Coverage     16.57%   16.55%   -0.03%     
- Complexity    13868    13907      +39     
============================================
  Files          5719     5732      +13     
  Lines        507178   509379    +2201     
  Branches      61571    61850     +279     
============================================
+ Hits          84085    84305     +220     
- Misses       413674   415636    +1962     
- Partials       9419     9438      +19     
Flag Coverage Δ
uitests 3.93% <ø> (-0.04%) ⬇️
unittests 17.43% <10.64%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nvazquez nvazquez force-pushed the cks-enhancements-upstream branch from 5710f92 to 469c08d Compare May 22, 2024 00:59
@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 9650

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 13561

@nvazquez
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13562

@nvazquez
Copy link
Contributor Author

@blueorangutan test matrix

@blueorangutan
Copy link

@nvazquez a [SL] Trillian-Jenkins matrix job (EL8 mgmt + EL8 KVM, Ubuntu22 mgmt + Ubuntu22 KVM, EL8 mgmt + VMware 7.0u3, EL9 mgmt + XCP-ng 8.2 ) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13439)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 54305 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t13439-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link

[SF] Trillian test result (tid-13440)
Environment: kvm-ubuntu22 (x2), Advanced Networking with Mgmt server u22
Total time taken: 61952 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t13440-kvm-ubuntu22.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link

[SF] Trillian test result (tid-13441)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 62096 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t13441-vmware-70u3.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_prepare_and_cancel_maintenance Error 0.16 test_ms_maintenance_and_safe_shutdown.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-13442)
Environment: xcpng82 (x2), Advanced Networking with Mgmt server ol9
Total time taken: 85238 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t13442-xcpng82.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_prepare_and_cancel_maintenance Error 0.18 test_ms_maintenance_and_safe_shutdown.py

@apache apache deleted a comment from blueorangutan May 31, 2025
@nvazquez
Copy link
Contributor Author

nvazquez commented Jun 2, 2025

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13587

@nvazquez
Copy link
Contributor Author

nvazquez commented Jun 2, 2025

@blueorangutan test matrix

@blueorangutan
Copy link

@nvazquez a [SL] Trillian-Jenkins matrix job (EL8 mgmt + EL8 KVM, Ubuntu22 mgmt + Ubuntu22 KVM, EL8 mgmt + VMware 7.0u3, EL9 mgmt + XCP-ng 8.2 ) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-13457)

@blueorangutan
Copy link

[SF] Trillian test result (tid-13454)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 84982 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t13454-kvm-ol8.zip
Smoke tests completed. 130 look OK, 11 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_nic_secondaryip_add_remove Error 1518.26 test_multipleips_per_nic.py
ContextSuite context=TestNestedVirtualization>:setup Error 0.00 test_nested_virtualization.py
ContextSuite context=TestNetworkACL>:setup Error 0.00 test_network_acl.py
ContextSuite context=TestIpv6Network>:setup Error 0.00 test_network_ipv6.py
test_delete_account Error 1517.43 test_network.py
test_delete_network_while_vm_on_it Error 1.23 test_network.py
test_deploy_vm_l2network Error 1.24 test_network.py
test_l2network_restart Error 2.37 test_network.py
ContextSuite context=TestPortForwarding>:setup Error 3.65 test_network.py
ContextSuite context=TestPublicIP>:setup Error 12.94 test_network.py
test_reboot_router Failure 0.09 test_network.py
test_releaseIP Error 6.77 test_network.py
test_releaseIP_using_IP Error 6.66 test_network.py
ContextSuite context=TestRouterRules>:setup Error 6.75 test_network.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:setup Error 1521.93 test_network.py
ContextSuite context=TestPrivateGwACL>:setup Error 0.00 test_privategw_acl.py
ContextSuite context=TestAdapterTypeForNic>:setup Error 0.00 test_nic_adapter_type.py
ContextSuite context=TestNonStrictAffinityGroups>:setup Error 0.00 test_nonstrict_affinity_group.py
ContextSuite context=TestIsolatedNetworksPasswdServer>:setup Error 0.00 test_password_server.py
ContextSuite context=TestPortForwardingRules>:setup Error 0.00 test_portforwardingrules.py
ContextSuite context=TestProjectSuspendActivate>:setup Error 1529.43 test_projects.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-13455)
Environment: kvm-ubuntu22 (x2), Advanced Networking with Mgmt server u22
Total time taken: 94420 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t13455-kvm-ubuntu22.zip
Smoke tests completed. 129 look OK, 12 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestClusterDRS>:setup Error 0.00 test_cluster_drs.py
test_nic_secondaryip_add_remove Error 1520.44 test_multipleips_per_nic.py
ContextSuite context=TestNestedVirtualization>:setup Error 0.00 test_nested_virtualization.py
ContextSuite context=TestNetworkACL>:setup Error 0.00 test_network_acl.py
ContextSuite context=TestIpv6Network>:setup Error 0.00 test_network_ipv6.py
test_delete_account Error 1518.12 test_network.py
test_delete_network_while_vm_on_it Error 1.25 test_network.py
test_deploy_vm_l2network Error 1.25 test_network.py
test_l2network_restart Error 2.40 test_network.py
ContextSuite context=TestPortForwarding>:setup Error 3.76 test_network.py
ContextSuite context=TestPublicIP>:setup Error 12.54 test_network.py
test_reboot_router Failure 0.09 test_network.py
test_releaseIP Error 6.27 test_network.py
test_releaseIP_using_IP Error 6.64 test_network.py
ContextSuite context=TestRouterRules>:setup Error 6.73 test_network.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:setup Error 1522.54 test_network.py
ContextSuite context=TestAdapterTypeForNic>:setup Error 0.00 test_nic_adapter_type.py
ContextSuite context=TestNonStrictAffinityGroups>:setup Error 0.00 test_nonstrict_affinity_group.py
ContextSuite context=TestIsolatedNetworksPasswdServer>:setup Error 0.00 test_password_server.py
ContextSuite context=TestPortForwardingRules>:setup Error 0.00 test_portforwardingrules.py
ContextSuite context=TestPrivateGwACL>:setup Error 0.00 test_privategw_acl.py
ContextSuite context=TestProjectSuspendActivate>:setup Error 1530.42 test_projects.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-13456)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 111427 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9102-t13456-vmware-70u3.zip
Smoke tests completed. 137 look OK, 4 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestListVolumes>:setup Error 0.00 test_list_volumes.py
test_01_prepare_and_cancel_maintenance Error 0.16 test_ms_maintenance_and_safe_shutdown.py
test_01_deploy_vm_on_specific_host Error 13.72 test_vm_deployment_planner.py
test_02_deploy_vm_on_specific_cluster Error 3603.44 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 3.49 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 1.41 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 2.37 test_vm_deployment_planner.py
test_01_migrate_vm_strict_tags_success Error 3604.01 test_vm_strict_host_tags.py
test_02_migrate_vm_strict_tags_failure Error 3.92 test_vm_strict_host_tags.py
test_01_restore_vm_strict_tags_success Error 21.21 test_vm_strict_host_tags.py
test_02_restore_vm_strict_tags_failure Error 3603.92 test_vm_strict_host_tags.py
test_01_scale_vm_strict_tags_success Error 19.23 test_vm_strict_host_tags.py
test_02_scale_vm_strict_tags_failure Error 3604.22 test_vm_strict_host_tags.py
test_01_deploy_vm_on_specific_host_without_strict_tags Error 17.22 test_vm_strict_host_tags.py
test_02_deploy_vm_on_any_host_without_strict_tags Error 3606.16 test_vm_strict_host_tags.py
test_03_deploy_vm_on_specific_host_with_strict_tags_success Error 9.07 test_vm_strict_host_tags.py
test_04_deploy_vm_on_any_host_with_strict_tags_success Error 26.34 test_vm_strict_host_tags.py

@nvazquez
Copy link
Contributor Author

nvazquez commented Jun 4, 2025

@blueorangutan test matrix

@blueorangutan
Copy link

@nvazquez a [SL] Trillian-Jenkins matrix job (EL8 mgmt + EL8 KVM, Ubuntu22 mgmt + Ubuntu22 KVM, EL8 mgmt + VMware 7.0u3, EL9 mgmt + XCP-ng 8.2 ) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-13467)

@Pearl1594
Copy link
Contributor

@blueorangutan test ol8 vmware70u3

@blueorangutan
Copy link

@Pearl1594 [SL] unsupported parameters provided. Supported mgmt server os are: ol8, ol9, debian12, rocky8, alma9, suse15, centos7, centos6, alma8, ubuntu18, ubuntu22, ubuntu20, ubuntu24. Supported hypervisors are: kvm-centos6, kvm-centos7, kvm-rocky8, kvm-ol8, kvm-ol9, kvm-alma8, kvm-alma9, kvm-ubuntu18, kvm-ubuntu20, kvm-ubuntu22, kvm-ubuntu24, kvm-debian12, kvm-suse15, vmware-55u3, vmware-60u2, vmware-65u2, vmware-67u3, vmware-70u1, vmware-70u2, vmware-70u3, vmware-80, vmware-80u1, vmware-80u2, vmware-80u3, xenserver-65sp1, xenserver-71, xenserver-74, xenserver-84, xcpng74, xcpng76, xcpng80, xcpng81, xcpng82, xcpng83

@Pearl1594
Copy link
Contributor

@blueorangutan test ol8 vmware-70u3

@blueorangutan
Copy link

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-70u3) has been kicked to run smoke tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants