Skip to content

Conversation

@vishesh92
Copy link
Member

Description

This PR adds a check on host's status. Without this if the agent is not in Up or Connecting state, expunging of a VM fails.

Steps to reproduce:

  1. Enable vm.configdrive.force.host.cache.use in Global Configuration.
  2. Create a L2 network with config drive
  3. Deploy a vm with the L2 network created in previous step
  4. Stop the vm and destroy vm (not expunge it)
  5. Stop the cloudstack-agent on the VM's host
  6. Expunge the vm

Fixes: #7428

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

How Has This Been Tested?

Setup an environment using mbx and following the steps above to reproduce it.

@vishesh92
Copy link
Member Author

@blueorangutan package

@vishesh92
Copy link
Member Author

Thanks @weizhouapache for helping in debugging this issue.

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

I think it is safe to leave the configdrive ISO on the host cache.
It would be good if it can be cleaned when cloudstack agent is back to normal

return false;
}
if (!Arrays.asList(Status.Up, Status.Connecting).contains(hostVO.getStatus())) {
LOG.warn(String.format("Host status %s is not Up or Connecting, skipping deletion of config-drive ISO on host cache", hostId));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weizhouapache @vishesh92 Would connecting be included, asking as what if the host isn't up to take the command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohityadavcloud This returns false if the status is NOT Up or Connecting..
I matching the checks which are present here: https://github.com/apache/cloudstack/blob/4.18/engine/orchestration/src/main/java/com/cloud/agent/manager/AgentManagerImpl.java#L970

Copy link
Member

@rohityadavcloud rohityadavcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but left one remark. I haven't tested it but this may require manual testing if it's not covered by an integration test. cc @vishesh92 @borisstoyanov @kiranchavala @vladimirpetrov

@vishesh92
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@vishesh92 a [LL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [LL]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 6071

@rohityadavcloud rohityadavcloud added this to the 4.18.1.0 milestone Jun 5, 2023
@rohityadavcloud rohityadavcloud marked this pull request as ready for review June 12, 2023 10:10
Copy link
Contributor

@kiranchavala kiranchavala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Tested the steps manually and was able to expunge the vm

  1. Enable vm.configdrive.force.host.cache.use in Global Configuration.
    
  2. Create a L2 network with config drive
    
  3. Deploy a vm with the L2 network created in previous step
    
  4. Stop the vm and destroy vm (not expunge it)
    
  5. Stop the cloudstack-agent on the VM's host
    
  6. Expunge the vm
    

@rohityadavcloud
Copy link
Member

@blueorangutan package

@rohityadavcloud rohityadavcloud linked an issue Jun 22, 2023 that may be closed by this pull request
@blueorangutan
Copy link

@rohityadavcloud a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 6307

@kiranchavala
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@kiranchavala a [SF] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-6807)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 43388 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7584-t6807-kvm-centos7.zip
Smoke tests completed. 107 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_migrate_VM_and_root_volume Error 82.93 test_vm_life_cycle.py
test_02_migrate_VM_with_two_data_disks Error 58.59 test_vm_life_cycle.py

@rohityadavcloud
Copy link
Member

Review smoketests LGTM, other PRs have the same intermittent issue.

@rohityadavcloud rohityadavcloud merged commit 0acc66f into apache:4.18 Jun 23, 2023
@rohityadavcloud rohityadavcloud deleted the configdrive-add-status-check branch June 23, 2023 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Destroy vm API errors with failure to delete ConfigDrive

7 participants