Skip to content

Conversation

@DK101010
Copy link
Contributor

@DK101010 DK101010 commented Jan 29, 2021

Description

Currently hot add memory and cpu is always enabled when it supported. In some situation it is necessary to disable that features for a specific vm.

With this PR User can disable hot add memory and cpu via vm settings in the ui or via api call. the default is still enabled, therefore it should not break existing behavior.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Manuelly tested in vmware env.

@weizhouapache
Copy link
Member

code lgtm

@DaanHoogland
Copy link
Contributor

@blueorangutan package


// Check for hotadd settings
vmConfigSpec.setMemoryHotAddEnabled(vmMo.isMemoryHotAddSupported(guestOsId));
vmConfigSpec.setMemoryHotAddEnabled(vmMo.isMemoryHotAddSupported(guestOsId) && Boolean.parseBoolean(vmSpec.getDetails().get(VmDetailConstants.HOT_ADD_MEMORY)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this mean you can turn it off but never back on again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, you can switch on/off like you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DK101010, I probably don't understand but the vmMo is found on the hypervisor with vmMo = hyperHost.findVmOnHyperHost(vmInternalCSName);, if it was already created. And if on creation memoryHotAddEnabled was set to false, the above will always be false on the next start, and hence all subsequent. Maybe this is not in the functional intent of this PR but it seems when you turn it off for a VM it will not be turned on again if the user switches it on again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DaanHoogland, At first I thought it also. But this method will be called each time when you start a vm and each time will be update the vm config spec. Btw. you can find in VmwareHelper in method setBasicVmConfig similar logic to enable/disable hot add but it will be override from this method. I find it also a little bit strange and perhaps it have refactoring potential. But currently I have not so much time and knowledge to do it. Independent from this, I tested the code to enable and disable hot add for cpu and memory and it works like expected. I checked it also in VCenter if it enabled/disabled.

@harikrishna-patnala
Copy link
Contributor

@DK101010 we already have a flag per VM to enable or disable dynamic scalability of resources on the VM.
image
May I know why we need these settings again ?

Also keeping two different flags for CPU and memory may conflict during actual VM deployment (with different combinations of true and false) since we are controlling the dynamic scalability of VM with only one flag at global/zone setting.

@DK101010
Copy link
Contributor Author

DK101010 commented Feb 2, 2021

@DK101010 we already have a flag per VM to enable or disable dynamic scalability of resources on the VM.
image
May I know why we need these settings again ?

Also keeping two different flags for CPU and memory may conflict during actual VM deployment (with different combinations of true and false) since we are controlling the dynamic scalability of VM with only one flag at global/zone setting.

Hi @harikrishna-patnala, until now I don't know this feature. I had checked this, but how I can disable/enable hot add ? I can set dynamic scalability true/false but hot add cpu and memory keeps enabled.

@DK101010
Copy link
Contributor Author

DK101010 commented Feb 3, 2021

@harikrishna-patnala @DaanHoogland Here is a alternative implementation to use the dynamic scalability flag. What do you think?

}
if(vm.getVirtualMachine() instanceof VMInstanceVO){
VMInstanceVO vmInstanceVO =(VMInstanceVO) vm.getVirtualMachine();
to.setEnableDynamicallyScaleVm(vmInstanceVO.isDynamicallyScalable());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HypervisorGuruBase is already setting this paramter in toVirtualMachineTO(), can you please double check if this is necessary or redundant in VmwareVMImplementer.
to.setEnableDynamicallyScaleVm(isDynamicallyScalable);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HypervisorGuruBase is already setting this paramter in toVirtualMachineTO(), can you please double check if this is necessary or redundant in VmwareVMImplementer.
to.setEnableDynamicallyScaleVm(isDynamicallyScalable);

@harikrishna-patnala Hmm ... during my test I could enable/disable the flag in the fronend but in backend it keeps of false. That is the reason for my implementation in VmwareVmImplementer.java

I have checked the HypervisorGuru and found follow line
Boolean isDynamicallyScalable = vmInstance.isDynamicallyScalable() && UserVmManager.EnableDynamicallyScaleVm.valueIn(vm.getDataCenterId());

I think I understood now what do you mean with zone settings. ;) But I ask me why we need two flags for the same thing. In my opinion it is confusing and not handy for a user to enable two flags to use this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit of background guys: The VmwareVMImplementer is a worker class that is part of the VmwareGuru, and is meant to reduce the complexity by extracting the deploy and start code. It would be good to put shared code in a VmwareGuruUtilities class to prevent duplication. I can imagine that setting flags can be done on implement as well as on restart/migrate or other methods. I might be guilty of this redundancy, but take care that setting might happen twice in one scenario but twice in another ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DaanHoogland for explaining that.
Here in this case setting isDynamicallyScalable on VM has no dependency on hypervisor type that is why HypervisorGuruBase sets the dynamic scaling flag on VM while preparing TO (transfer object). Setting this flag in either VMwareGuru or VMwareVMImplementer is not required.

@DK101010 I would suggest you to please revert the change HypervisorGuruBase in which you ignored the global/zone setting. This is required because global/zone level setting actually decides whether dynamic scaling can be enabled in that management setup or not. vmInstance.isDynamicallyScalable() is not sufficient. There is another PR#4643 which fixes and all these settings. For this PR you can keep the VMwareResource changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland Thanks for input. Good to know for the future.

@harikrishna-patnala sure, i can revert this, but I still don't find it practical to enable 2 flags to turn a feature on ;).

@DK101010 DK101010 force-pushed the feat/hot_add_memory_cpu branch from d478cad to d076f1c Compare February 18, 2021 14:23
}

if(!vmMo.isMemoryHotAddSupported(guestOsId) && vmSpec.isEnableDynamicallyScaleVm()){
s_logger.warn("hotadd is not supported, dynamic scaling feature can not be applied " + vmInternalCSName);
Copy link
Contributor

@harikrishna-patnala harikrishna-patnala Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DK101010 for making the changes and my apologies for coming on this PR this late. can you please add the reason being guest OS does not support hot add, something like "hotadd of memory is not supported by the guest OS, dynamic scaling feature can not be applied " + vmInternalCSName)".

and please add another log for CPU hot add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harikrishna-patnala no problem, I will adapt it.

Copy link
Contributor

@harikrishna-patnala harikrishna-patnala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

DK101010 and others added 2 commits May 5, 2021 15:47
…vmware/resource/VmwareResource.java

Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>
…vmware/resource/VmwareResource.java

Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>
Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@apache apache deleted a comment from blueorangutan Jun 4, 2021
@apache apache deleted a comment from blueorangutan Jun 4, 2021
@apache apache deleted a comment from blueorangutan Jun 4, 2021
@apache apache deleted a comment from blueorangutan Jun 4, 2021
@apache apache deleted a comment from blueorangutan Jun 4, 2021
@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 163

@blueorangutan
Copy link

Trillian test result (tid-857)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 37551 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4630-t857-vmware-67u3.zip
Smoke tests completed. 88 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@DaanHoogland
Copy link
Contributor

@borisstoyanov @vladimirpetrov any extra tests required? cc @sureshanaparti @nvazquez

@rohityadavcloud
Copy link
Member

LGTM, need some testing/confirmation

@nvazquez
Copy link
Contributor

nvazquez commented Aug 6, 2021

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 786

@DaanHoogland
Copy link
Contributor

@blueorangutan test matrix

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins matrix job (centos7 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian Build Failed (tid-1529)

@blueorangutan
Copy link

Trillian test result (tid-1527)
Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7
Total time taken: 35318 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4630-t1527-xenserver-71.zip
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Smoke tests completed. 89 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link

Trillian test result (tid-1528)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 57877 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4630-t1528-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_diagnostics.py
Intermittent failure detected: /marvin/tests/smoke/test_iso.py
Intermittent failure detected: /marvin/tests/smoke/test_outofbandmanagement.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 87 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_05_ping_in_cpvm_success Failure 14.35 test_diagnostics.py
test_06_download_detached_volume Failure 3990.75 test_volumes.py

@rohityadavcloud
Copy link
Member

@blueorangutan centos7 vmware-67u3

@rohityadavcloud
Copy link
Member

@blueorangutan help

@blueorangutan
Copy link

@rhtyd I understand these words: "help", "hello", "thanks", "package", "test"
Test command usage: test [mgmt os] [hypervisor] [keepEnv]
Mgmt OS options: ['centos7', 'centos6', 'alma8', 'ubuntu18', 'suse15', 'ubuntu20', 'rocky8', 'centos8']
Hypervisor options: ['kvm-centos6', 'kvm-centos7', 'kvm-centos8', 'kvm-rocky8', 'kvm-alma8', 'kvm-ubuntu18', 'kvm-ubuntu20', 'kvm-suse15', 'vmware-55u3', 'vmware-60u2', 'vmware-65u2', 'vmware-67u3', 'vmware-70u1', 'xenserver-65sp1', 'xenserver-71', 'xenserver-74', 'xcpng74', 'xcpng76', 'xcpng80', 'xcpng81']
Note: when keepEnv is passed, you need to specify mgmt server os and hypervisor or use the matrix command.

Blessed contributors for kicking Trillian test jobs: ['rhtyd', 'nvazquez', 'PaulAngus', 'borisstoyanov', 'DaanHoogland', 'shwstppr', 'andrijapanicsb', 'Spaceman1984', 'Pearl1594', 'davidjumani', 'harikrishna-patnala', 'vladimirpetrov', 'sureshanaparti', 'weizhouapache']

@rohityadavcloud
Copy link
Member

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1546)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 57475 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4630-t1546-vmware-67u3.zip
Intermittent failure detected: /marvin/tests/smoke/test_accounts.py
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermittent failure detected: /marvin/tests/smoke/test_nested_virtualization.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_host_maintenance.py
Smoke tests completed. 85 look OK, 4 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_vpc_privategw_restart_vpc_cleanup Failure 442.28 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 753.07 test_privategw_acl.py
test_01_isolate_network_FW_PF_default_routes_egress_true Error 155.63 test_routers_network_ops.py
test_02_isolate_network_FW_PF_default_routes_egress_false Failure 147.06 test_routers_network_ops.py
test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failure 436.94 test_routers_network_ops.py
test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failure 449.87 test_routers_network_ops.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 686.77 test_vpc_redundant.py
test_05_rvpc_multi_tiers Failure 635.23 test_vpc_redundant.py
test_02_cancel_host_maintenace_with_migration_jobs Error 165.73 test_host_maintenance.py
test_03_cancel_host_maintenace_with_migration_jobs_failure Error 21.98 test_host_maintenance.py

Copy link
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@nvazquez
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 857

@nvazquez
Copy link
Contributor

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1622)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 57150 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4630-t1622-vmware-67u3.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_list_ids_parameter.py
Intermittent failure detected: /marvin/tests/smoke/test_network.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 87 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_04_rvpc_privategw_static_routes Failure 1477.14 test_privategw_acl.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 666.45 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 809.77 test_vpc_redundant.py
test_05_rvpc_multi_tiers Failure 607.88 test_vpc_redundant.py

Copy link
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Vmware 6.5u2:

  • Hot add unsupported guest OS (Other Linux 64 bits) -> CPU and Memory hot add flags always disabled irrespective of dynamic calling values
  • Hot add supported guest OS (Debian 10 64 bits) -> CPU and memory hot add flags enabled/disabled on vCenter accordingly

Note: support compatibility checked on: https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=software&testConfig=16&productid=48848&supRel=408,&deviceCategory=software&details=1&releases=408&operatingSystems=260&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc&testConfig=16

@nvazquez nvazquez merged commit 1bfb2f9 into apache:main Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants