Skip to content

Conversation

@GutoVeronezi
Copy link
Contributor

Description

When starting a VM on KVM, ACS sets VM's CPU topology through a calculation of CPU sockets and CPU cores per socket on VM's XML. Along the calculation, for instance, it assumes that a VM that use 12 vCPUs automatically works with 2 sockets and 6 cores, or a VM that use 16 vCPUs automatically works 4 sockets and 4 cores:

private void setCpuTopology(CpuModeDef cmd, int vcpus, Map<String, String> details) {
// multi cores per socket, for larger core configs
int numCoresPerSocket = -1;
if (details != null) {
final String coresPerSocket = details.get(VmDetailConstants.CPU_CORE_PER_SOCKET);
final int intCoresPerSocket = NumbersUtil.parseInt(coresPerSocket, numCoresPerSocket);
if (intCoresPerSocket > 0 && vcpus % intCoresPerSocket == 0) {
numCoresPerSocket = intCoresPerSocket;
}
}
if (numCoresPerSocket <= 0) {
if (vcpus % 6 == 0) {
numCoresPerSocket = 6;
} else if (vcpus % 4 == 0) {
numCoresPerSocket = 4;
}
}
if (numCoresPerSocket > 0) {
cmd.setTopology(numCoresPerSocket, vcpus / numCoresPerSocket);
}
}

This behavior is arbitrary and operators cannot choose to use it or not.

By default (fedora - Setting KVM processor affinities), if we not determine the CPU topology, the hypervisor will alloc the VM on any available CPU; Therefore, this PR intends to externalize a property (enable.manually.setting.cpu.topology.on.kvm.vm) on agent.properties to allow the operators to decide if they want to do the calculation or not. The default behavior still will be to do the calculation.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

How Has This Been Tested?

It was tested on a local lab.

  • I created a host with 1 socket and 8 cores;
  • I created a VM with 8 vCPUs;
  • When the property doesn't exist in the file or when it is true, the topology is set:
    • <topology sockets='2' cores='4' threads='1'/>;
  • When the property is set to false, the topology is not set;

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 745

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 747

@weizhouapache
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@weizhouapache a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1470)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 47219 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5273-t1470-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermittent failure detected: /marvin/tests/smoke/test_network.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_router_dhcphosts.py
Intermittent failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Smoke tests completed. 88 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_isolate_network_FW_PF_default_routes_egress_true Failure 134.07 test_routers_network_ops.py

Copy link
Member

@GabrielBrascher GabrielBrascher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 777

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 778

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian Build Failed (tid-1516)

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian Build Failed (tid-1530)

@blueorangutan
Copy link

Trillian Build Failed (tid-1542)

@DaanHoogland
Copy link
Contributor

Not all of the el7 packages were built, retrying

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 826

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1591)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 72897 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5273-t1591-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_network.py
Intermittent failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermittent failure detected: /marvin/tests/smoke/test_storage_policy.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Intermittent failure detected: /marvin/tests/smoke/test_usage.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Intermittent failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 81 look OK, 8 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_invalid_upgrade_kubernetes_cluster Failure 3624.29 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 137.24 test_kubernetes_clusters.py
test_04_extract_template Failure 129.25 test_templates.py
test_01_volume_usage Failure 789.30 test_usage.py
test_10_attachAndDetach_iso Failure 1613.33 test_vm_life_cycle.py
test_06_download_detached_volume Failure 516.05 test_volumes.py
ContextSuite context=TestVPCRedundancy>:setup Error 0.00 test_vpc_redundant.py
ContextSuite context=TestRVPCSite2SiteVpn>:setup Error 0.00 test_vpc_vpn.py
ContextSuite context=TestVPCSite2SiteVPNMultipleOptions>:setup Error 0.00 test_vpc_vpn.py
ContextSuite context=TestVpcRemoteAccessVpn>:setup Error 0.00 test_vpc_vpn.py
ContextSuite context=TestVpcSite2SiteVpn>:setup Error 0.00 test_vpc_vpn.py
test_disable_oobm_ha_state_ineligible Error 1516.29 test_hostha_kvm.py
test_hostha_kvm_host_degraded Error 13.43 test_hostha_kvm.py

@DaanHoogland DaanHoogland added this to the 4.16.0.0 milestone Aug 12, 2021
@nvazquez
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1631)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 47657 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5273-t1631-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_network.py
Intermittent failure detected: /marvin/tests/smoke/test_password_server.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 85 look OK, 4 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_port_fwd_on_src_nat Failure 797.85 test_network.py
test_02_port_fwd_on_non_src_nat Failure 813.18 test_network.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 374.27 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 432.49 test_privategw_acl.py
test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failure 342.64 test_routers_network_ops.py
test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failure 342.20 test_routers_network_ops.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 558.56 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 491.34 test_vpc_redundant.py

@nvazquez nvazquez merged commit 349120f into apache:main Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants