Skip to content

Conversation

@harikrishna-patnala
Copy link
Contributor

@harikrishna-patnala harikrishna-patnala commented Mar 27, 2023

Description

This PR fixes the issue #7357 where VM deployment is failing when storage tags are used in certain way with multiple clusters, though a proper cluster to deploy VM is available.

Root cause is the usage of certain variables to figure out the final avoid pools set in the deployment planner.

image

because of this, "originalAvoidPoolSet" is getting updated whenever avoid.getPoolsToAvoid() is updated and this should not be the case.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

@harikrishna-patnala harikrishna-patnala force-pushed the FixVMdeploymentWithAvoidVariable branch from 74681d4 to 4c60621 Compare March 27, 2023 11:25
@harikrishna-patnala harikrishna-patnala changed the base branch from main to 4.17 March 27, 2023 11:25
@harikrishna-patnala
Copy link
Contributor Author

This issue is in 4.17 as well, so rebased the branch to 4.17 as this is a critical issue

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a Jenkins job has been kicked to build packages. It will be bundled with

SystemVM template(s). I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 5785

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

Comment on lines +1579 to +1583
Set<Long> originalAvoidPoolSet = new HashSet<>();
if (avoid.getPoolsToAvoid() != null) {
originalAvoidPoolSet.addAll(avoid.getPoolsToAvoid());
}
Set<Long> poolsToAvoidOutput = new HashSet<Long>(originalAvoidPoolSet);
Set<Long> poolsToAvoidOutput = new HashSet<>(originalAvoidPoolSet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harikrishna-patnala , what is the logic here? As I see it it just adds to the avoid set to remember later and doesn´t clean the pools from trying the prior cluster from the avoid set trying this cluster.

If it works it works but It doesn't read like logical to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic is already down there @DaanHoogland.

if (avoid.getPoolsToAvoid() != null) {
poolsToAvoidOutput.addAll(avoid.getPoolsToAvoid());
avoid.getPoolsToAvoid().retainAll(originalAvoidPoolSet);
}
if (!foundPotentialPools) {
s_logger.debug("No suitable pools found for volume: " + toBeCreated + " under cluster: " + plan.getClusterId());
// No suitable storage pools found under this cluster for this
// volume. - remove any suitable pools found for other volumes.
// All volumes should get suitable pools under this cluster;
// else we cant use this cluster.
suitableVolumeStoragePools.clear();
break;
}
}
HashSet<Long> toRemove = new HashSet<Long>();
for (List<StoragePool> lsp : suitableVolumeStoragePools.values()) {
for (StoragePool sp : lsp) {
toRemove.add(sp.getId());
}
}
poolsToAvoidOutput.removeAll(toRemove);
if (avoid.getPoolsToAvoid() != null) {
avoid.getPoolsToAvoid().addAll(poolsToAvoidOutput);
}
if (suitableVolumeStoragePools.isEmpty()) {
s_logger.debug("No suitable pools found");
}

It is just that originalAvoidPoolSet was not maintained properly because of "Set originalAvoidPoolSet = avoid.getPoolsToAvoid();". Later in the planner whenever avoid.getPoolsToAvoid() is getting updated, originalAvoidPoolSet is also getting updated.

For example with 2 volumes, during the loop for each volume, for 1st volume avoid.getPoolsToAvoid() adds p3, p4 and also in originalAvoidPoolSet. For the 2nd volume p3, p4 should not be in avoid set, but since originalAvoidPoolSet also got updated planner is avoiding p3, p4.

Yes, this is working during my testing. I've verified with the debugger too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @harikrishna-patnala it still doesn´t read logical, but with your explanation it makes total sense 👍

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@harikrishna-patnala a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Copy link

@rajujith rajujith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed the issue reproduction steps and the issue is not observed. Its LGTM.

@blueorangutan
Copy link

Trillian test result (tid-6327)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 37558 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7372-t6327-kvm-centos7.zip
Smoke tests completed. 98 look OK, 3 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_deploy_vm_with_extraconfig_kvm Error 1.16 test_deploy_vm_extra_config_data.py
test_03_update_vm_with_extraconfig_kvm Error 1.13 test_deploy_vm_extra_config_data.py
test_08_upgrade_kubernetes_ha_cluster Failure 670.38 test_kubernetes_clusters.py
test_02_deploy_vm_on_specific_cluster Error 1.11 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 0.15 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 0.14 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 0.14 test_vm_deployment_planner.py

@DaanHoogland DaanHoogland merged commit 9fb2005 into apache:4.17 Mar 29, 2023
DaanHoogland added a commit that referenced this pull request Mar 29, 2023
* 4.18:
  Fixed avoid set variables which is causing deployment failures (#7372)
  Add service ip to listManagementServers API response (#7374)
  UI: fix default network is not passed to deployvm API (#7367)
  ui: Added UEFI support flag in host details view (#7361)
  removed vulnerable workflow
@rohityadavcloud rohityadavcloud added this to the 4.18.1.0 milestone Mar 31, 2023
kishankavala pushed a commit to shapeblue/cloudstack that referenced this pull request Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants