Skip to content

Conversation

@khmarochos
Copy link
Contributor

@khmarochos khmarochos commented May 21, 2018

Description

I'd propose to kill the qemu-kvm processes of the VMs whose volumes are located on an NFS storage instead of rebooting the whole host.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

GitHub Issue/PRs

Fixes: #2657

Screenshots (if appropriate):

How Has This Been Tested?

Performed tests on a KVM host in my environment, OS is CentOS 7.x.

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
    Testing
  • I have added tests to cover my changes.
  • All relevant new and existing integration tests have passed.
  • A full integration testsuite with all test that can run on my environment has passed.

Fixes #2657

@DaanHoogland DaanHoogland changed the title Fix issue #2657 improve KVM hosts reset (by not using a script) May 22, 2018
@DaanHoogland DaanHoogland changed the title improve KVM hosts reset (by not using a script) improve KVM hosts reset (by not rebooting in script) May 22, 2018
@wido
Copy link
Contributor

wido commented Jun 7, 2018

But will this work? A hanging NFS mount will put those procs in status 'D' and they can't be killed. Not even with -9.

@khmarochos
Copy link
Contributor Author

khmarochos commented Jun 7, 2018 via email

@wido
Copy link
Contributor

wido commented Jun 8, 2018

@Melnik13 I understand what you are saying, but a proc in status D won't die. So it's not truly fenced. That's the difficulty.

You might want to go for a 'kill -9' just to be sure.

@borisstoyanov
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✖debian. JID-2147

@borisstoyanov
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2807)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 28397 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2658-t2807-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_certauthority_root.py
Intermitten failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermitten failure detected: /marvin/tests/smoke/test_outofbandmanagement.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermitten failure detected: /marvin/tests/smoke/test_volumes.py
Intermitten failure detected: /marvin/tests/smoke/test_host_maintenance.py
Smoke tests completed. 62 look OK, 5 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_provision_certificate Error 9.33 test_certauthority_root.py
ContextSuite context=TestDeployVirtioSCSIVM>:setup Error 0.00 test_deploy_virtio_scsi_vm.py
test_03_vpc_privategw_restart_vpc_cleanup Error 144.01 test_privategw_acl.py
test_01_secured_vm_migration Error 39.68 test_vm_life_cycle.py
test_02_not_secured_vm_migration Error 39.64 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 40.62 test_vm_life_cycle.py
test_04_nonsecured_to_secured_vm_migration Error 39.65 test_vm_life_cycle.py
test_11_migrate_volume_and_change_offering Error 128.60 test_volumes.py

@DaanHoogland
Copy link
Contributor

@Melnik13 @wido I saw related PRs. Is this one still valid?

@rafaelweingartner
Copy link
Member

@Melnik13 is this PR still valid?

@khmarochos
Copy link
Contributor Author

@rafaelweingartner,
As I can see, the changes have been made to scripts/vm/hypervisor/kvm/kvmheartbeat.sh, so this PR can be considered as closed.
Thanks to all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KVM hosts are being reset by a script

6 participants