Skip to content

Conversation

@shwstppr
Copy link
Contributor

@shwstppr shwstppr commented Mar 6, 2023

Description

Fixes #7311

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Fixes apache#7311

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
@shwstppr shwstppr changed the base branch from main to 4.17 March 6, 2023 07:11
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Copy link
Member

@rohityadavcloud rohityadavcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions/remarks. Overall logic seems better than earlier.


protected Host getVmwareHostFromVolumeToDelete(VolumeInfo volume) {
VirtualMachine vm = volume.getAttachedVM();
if (vm == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the volume isn't attached to a VM - would it be removed by GC (if not by this code) then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes then the calling method code tries to find a host based on the storage at line 476 https://github.com/apache/cloudstack/pull/7312/files#diff-45668a1459e79e7705bb54a8f6e09d972a12ffe972b06b61468e06f4f8ce2c3dR476

if (hostId == null) {
return null;
}
HostVO host = hostDao.findById(hostId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question - should we find host which has access to the storage pool (not always the vm's host ID or the last host ID?) to be responsible for deleting the volume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will try to find the host based on the storage pool of the volume at line 476, https://github.com/apache/cloudstack/pull/7312/files#diff-45668a1459e79e7705bb54a8f6e09d972a12ffe972b06b61468e06f4f8ce2c3dR476

An option could be to completely remove the logic to try and find a host based on volume VM

@DaanHoogland DaanHoogland added this to the 4.18.0.0 milestone Mar 6, 2023
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
@shwstppr shwstppr marked this pull request as ready for review March 6, 2023 12:58
@shwstppr
Copy link
Contributor Author

shwstppr commented Mar 6, 2023

Tested the fix with a 4.15.2.0 patch. DeleteCommand send to correct host in the source cluster

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 5682

@shwstppr
Copy link
Contributor Author

shwstppr commented Mar 6, 2023

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link

@shwstppr a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-6265)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 40762 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7312-t6265-vmware-67u3.zip
Smoke tests completed. 99 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_04_deploy_vm_with_extraconfig_throws_exception_vmware Error 1.17 test_deploy_vm_extra_config_data.py
test_05_deploy_vm_with_extraconfig_vmware Error 1.14 test_deploy_vm_extra_config_data.py
test_01_deploy_vm_on_specific_host Error 4.31 test_vm_deployment_planner.py
test_02_deploy_vm_on_specific_cluster Error 0.12 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 0.15 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 1.17 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 1.14 test_vm_deployment_planner.py

Copy link
Member

@rohityadavcloud rohityadavcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - this would need manual testing and regression testing for various permutations/combinations around the change.

Copy link
Contributor

@vladimirpetrov vladimirpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(wrong PR, comment removed)

@shwstppr
Copy link
Contributor Author

@vladimirpetrov probably update the wrong PR. Re-requesting your review

@shwstppr shwstppr requested a review from vladimirpetrov March 10, 2023 07:29
@DaanHoogland DaanHoogland modified the milestones: 4.18.0.0, 4.18.1.0 Mar 10, 2023
@nvazquez
Copy link
Contributor

@shwstppr is this PR still relevant for 4.17 or can be closed?

Copy link
Contributor

@vladimirpetrov vladimirpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on manual testing.

@shwstppr
Copy link
Contributor Author

@nvazquez cc @vladimirpetrov yes, we can close this. As per my update on this tagged issue, the problem is fixed with #5575. ACS marks volume as detached once migration is complete and then deletion of source volume file is handled by an appropriate host.

@shwstppr shwstppr closed this Mar 28, 2023
@weizhouapache weizhouapache removed this from the 4.18.1.0 milestone Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vmware: vm/volume inter-cluster migration failure during vm start operation

7 participants