Skip to content

Conversation

@davidjumani
Copy link
Contributor

@davidjumani davidjumani commented Sep 16, 2020

Description

Adding AutoScaling support for cks
Kubernetes PR : kubernetes/autoscaler#3629
Also replaces CoreOS with Debian
Fixes #4198

TODO: Remove the templateid and template name from KubernetesClusterResponse and DB since the templates can vary after acs upgrades

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

How Has This Been Tested?

TODO

@rohityadavcloud rohityadavcloud added this to the 4.16.0.0 milestone Sep 17, 2020
@davidjumani davidjumani force-pushed the add-cks-autoscaling branch 2 times, most recently from 2d14b4d to 8784cc8 Compare September 21, 2020 06:21
@davidjumani
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@davidjumani a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2053

@davidjumani davidjumani force-pushed the add-cks-autoscaling branch 2 times, most recently from a155864 to d4e9a9b Compare October 1, 2020 07:24
@davidjumani
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@davidjumani a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2209

@davidjumani davidjumani force-pushed the add-cks-autoscaling branch 2 times, most recently from a1bb7ae to d0569c7 Compare October 21, 2020 11:31
@davidjumani
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@davidjumani a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2251

@davidjumani davidjumani force-pushed the add-cks-autoscaling branch 2 times, most recently from 5309354 to 90ff19d Compare October 22, 2020 09:26
@davidjumani davidjumani force-pushed the add-cks-autoscaling branch 9 times, most recently from ff1e0fd to 4f00c51 Compare October 28, 2020 05:14
@davidjumani
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@davidjumani a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1429

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1430

@vladimirpetrov
Copy link
Contributor

@blueorangutan test matrix

@blueorangutan
Copy link

@vladimirpetrov a Trillian-Jenkins matrix job (centos7 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2233)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 35293 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4329-t2233-kvm-centos7.zip
Smoke tests completed. 89 look OK, 0 have errors
Only failed tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link

Trillian test result (tid-2232)
Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7
Total time taken: 36825 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4329-t2232-xenserver-71.zip
Smoke tests completed. 88 look OK, 1 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_04_extract_template Failure 128.33 test_templates.py

@blueorangutan
Copy link

Trillian test result (tid-2234)
Environment: vmware-65u2 (x2), Advanced Networking with Mgmt server 7
Total time taken: 36943 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4329-t2234-vmware-65u2.zip
Smoke tests completed. 88 look OK, 1 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_live_migrate_VM_with_two_data_disks Error 67.74 test_vm_life_cycle.py

@rohityadavcloud
Copy link
Member

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2252)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 37957 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4329-t2252-vmware-67u3.zip
Smoke tests completed. 88 look OK, 1 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_live_migrate_VM_with_two_data_disks Error 63.03 test_vm_life_cycle.py

@rohityadavcloud
Copy link
Member

The vmware failure is same as in the health check PR, so not caused by this PR. Tests LGTM.

@alexandremattioli
Copy link
Contributor

LGTM

Copy link
Contributor

@vladimirpetrov vladimirpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on manual testing.

SystemVM template improvements:
Tested fresh deployments with KVM (CentOS 7), VMWare 67u3 and XCP-NG 8.2.
Tested upgrades from:

ACS version: 4.15
Hypervisor: KVM CentOS 8
Mgmt OS: CentOS 8

ACS version: 4.15
Hypervisor: VMWare 65u2
Mgmt OS: CentOS 7

ACS version: 4.15
Hypervisor: XenServer 7.2
Mgmt OS: Ubuntu 18

ACS version: 4.15
Hypervisor: VMWare 67u3
Mgmt OS: CentOS 7

ACS version: 4.14
Hypervisor: KVM CentOS 7
Mgmt OS: CentOS 7

ACS version: 4.15
Hypervisor: XCP-NG 8.2
Mgmt OS: CentOS 8

ACS version: 4.15.2
Hypervisor: KVM CentOS 8
Mgmt OS: CentOS 8

ACS version: 4.15.2
Hypervisor: VMWare 67u3
Mgmt OS: CentOS 7

ACS version: 4.15.2
Hypervisor: XCP-NG 8.2
Mgmt OS: Ubuntu 20

Kubernetes Cluster auto-scaling

Tested scenarios:

Deploy a v1.20 kubernetes cluster (not HA), setup scaling (min 1, max 2 worker nodes), increase the load, make sure the cluster scales up, stop the load, make sure the cluster scales down.

Deploy a v1.21 kubernetes cluster (not HA), setup scaling (min 1, max 2 worker nodes), increase the load, make sure the cluster scales up, stop the load, make sure the cluster scales down.

Deploy a v1.20 kubernetes cluster (not HA), setup scaling (min 1, max 2 worker nodes), increase the load, make sure the cluster scales up, stop the load, make sure the cluster scales down, upgrade the cluster to v.1.21 and repeat the procedure.

Deploy a v1.21 kubernetes cluster (not HA), setup scaling (min 1, max 3 worker nodes), increase the load, make sure the cluster scales up, change the scaling parameters to min 1, max 2 worker nodes, make sure the cluster scales down to 2 worker nodes, stop the load, make sure the cluster scales down to 1 worker node.

Deploy a v1.20 kubernetes cluster (HA enabled), setup scaling (min 1, max 2 worker nodes), increase the load, make sure the cluster scales up, stop the load, make sure the cluster scales down.

Deploy a v1.21 kubernetes cluster (HA enabled), setup scaling (min 1, max 2 worker nodes), increase the load, make sure the cluster scales up, stop the load, make sure the cluster scales down.

@rohityadavcloud
Copy link
Member

Ping @alexandremattioli or @andrijapanicsb are you lgtm on this as well?

@Pearl1594
Copy link
Contributor

@rhtyd - @alexandremattioli has already provided their review - #4329 (comment)

@rohityadavcloud
Copy link
Member

Ah okay thanks @Pearl1594; just a note to @alexandremattioli - please use Github's review -> LGTM that way it's easy to track approvals.
Let's wait for @andrijapanicsb to confirm as well.

@nvazquez
Copy link
Contributor

nvazquez commented Oct 5, 2021

@andrijapanicsb @alexandremattioli please advise after your tests/review, thanks

@andrijapanicsb
Copy link
Contributor

Alex is reviewing this, so I will not. Thx

@alexandremattioli
Copy link
Contributor

@nvazquez all good with tests. I've used this for some customer demos and worked very well, retested this week and all good. LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kubernetes container service: coreos is EOL