Skip to content

Conversation

@weizhouapache
Copy link
Member

@weizhouapache weizhouapache commented Jun 23, 2021

Description

This PR fixes a series of bug when test test_configdrive.py

(1) detachAnd Attach ConfigDrive ISO on kvm, in some cases configdrive ISO is updated, for example, plug a new nic, or migrate vm.

(2) use "copy" instead of "mv" when create configdrive ISO in SSVM, as configdrive ISO will disappear in VM if "mv" is used.

(3) fix two exceptions when enable/disable static nat on vms with configdrive ISO. (public IP in configdrive ISO is updated in these two cases)

(4) fix two issues in setup/bindir/cloud-set-guest-sshkey-configdrive.in

(5) fix few bugs in test/integration/component/test_configdrive.py

(6) build and use centos55 with both sshkey and configdrive support for testing.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

tested on with kvm.

@rohityadavcloud rohityadavcloud added this to the 4.15.1.0 milestone Jun 24, 2021
@rohityadavcloud
Copy link
Member

Should we consider that for 4.15.1?

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@weizhouapache weizhouapache changed the title server: fix failed to apply userdata when enable static nat configdrive: fix some failures in tests/component/test_configdrive.py Jun 24, 2021
@weizhouapache
Copy link
Member Author

Should we consider that for 4.15.1?

@rhtyd
these are bug fixes for some failures in tests/component/test_configdrive.py
(the testing is part of my POC of configdrive feature for VR agent)

They are not critical issues I think. we can move it to 4.15.2.0 or 4.16.0.0.

The testing still fails in testing of isolated networks and vpcs for now (shared networks works). Very few users use configdrive for vms on isolated networks and vpcs.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 347

@rohityadavcloud
Copy link
Member

@weizhouapache We know several users who use config drive feature esp for L2 networks and in VR-less networks, I'm planing to cut RC3 on Monday. Would this PR be ready before that?

@rohityadavcloud
Copy link
Member

I guess the question is - is the feature broken or just the tests?

@weizhouapache
Copy link
Member Author

@weizhouapache We know several users who use config drive feature esp for L2 networks and in VR-less networks, I'm planing to cut RC3 on Monday. Would this PR be ready before that?

@rhtyd it is difficult.

I guess the question is - is the feature broken or just the tests?

@rhtyd the feature is not broken.
however, there are several test failures in tests/component/test_configdrive.py, which means bugs in some scenarios.

what I have fixed in this pr for now
(1) cannot enable static nat for vms with configdrive, as apply userdata fails.
(2) vms are stuck at expunging, as disable static nat fails.
(3) configdrive ISO is not recognized in vm when add second nic to the vm.

@weizhouapache
Copy link
Member Author

@weizhouapache We know several users who use config drive feature esp for L2 networks and in VR-less networks, I'm planing to cut RC3 on Monday. Would this PR be ready before that?

@rhtyd I do not have an exact date when all issues are fixed and tests pass. it seems very difficult to be ready before next monday. it is better to move it to 4.15.2.0

@rohityadavcloud
Copy link
Member

cc @Pearl1594 @weizhouapache does it break any basic config drive feature if tested manually, say config drive with ssh support and user data on L2 and isolated networks (without VR)?

@weizhouapache
Copy link
Member Author

cc @Pearl1594 @weizhouapache does it break any basic config drive feature if tested manually, say config drive with ssh support and user data on L2 and isolated networks (without VR)?

@rhtyd from what I have tested, there are no issues with following vm actions
(1) deploy
(2) stop/start
(3) reset password
(4) reset sshkeypair
(5) update userdata
maybe it is good enough for L2 networks.

What might not work (some are confirmed, some are not)
(1) enable static nat, disable static nat
(2) plug nic, unplug nic, update default nic
(3) in vpc

@rohityadavcloud
Copy link
Member

Thanks for confirming @weizhouapache I'll move it to 4.15.2

@rohityadavcloud rohityadavcloud modified the milestones: 4.15.1.0, 4.15.2.0 Jun 24, 2021
@rohityadavcloud
Copy link
Member

@weizhouapache is this ready for review (PR is in draft)

@weizhouapache weizhouapache force-pushed the 4.15-fix-test-configdrive branch from 8781b99 to 845808b Compare June 29, 2021 07:36
@weizhouapache
Copy link
Member Author

@weizhouapache is this ready for review (PR is in draft)

@rhtyd @DaanHoogland
updated this pr to fix some bugs (see description).

please note I only tested with kvm, not xenserver.

@weizhouapache weizhouapache marked this pull request as ready for review June 29, 2021 07:46
@weizhouapache
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 411

@weizhouapache
Copy link
Member Author

@blueorangutan test

@blueorangutan
Copy link

@weizhouapache a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@DaanHoogland
Copy link
Contributor

@blueorangutan test centos7 xenserver-72

@blueorangutan
Copy link

@DaanHoogland unsupported parameters provided. Supported mgmt server os are: centos7, centos6, alma8, ubuntu18, suse15, ubuntu20, rocky8, centos8. Supported hypervisors are: kvm-centos6, kvm-centos7, kvm-centos8, kvm-rocky8, kvm-alma8, kvm-ubuntu18, kvm-ubuntu20, kvm-suse15, vmware-55u3, vmware-60u2, vmware-65u2, vmware-67u3, vmware-70u1, xenserver-65sp1, xenserver-71, xenserver-74, xcpng74, xcpng76, xcpng80, xcpng81

@DaanHoogland
Copy link
Contributor

@blueorangutan test centos7 xenserver-74

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + xenserver-74) has been kicked to run smoke tests

configdrive = disk;
}
}
if (configdrive != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could invert the logic here to reduce block complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your thinking @GutoVeronezi , but in this case that would leave a return in the middle of the method. I'd rather extract the block into an extra method.

@blueorangutan
Copy link

Trillian test result (tid-1156)
Environment: xenserver-74 (x2), Advanced Networking with Mgmt server 7
Total time taken: 41529 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5144-t1156-xenserver-74.zip
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Smoke tests completed. 86 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_11_migrate_volume_and_change_offering Error 9.53 test_volumes.py

@blueorangutan
Copy link

Trillian test result (tid-1154)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 61026 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5144-t1154-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_network.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_router_dhcphosts.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 84 look OK, 3 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Failure 430.44 test_internal_lb.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 407.03 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 378.33 test_privategw_acl.py
test_router_dhcphosts Failure 221.71 test_router_dhcphosts.py
ContextSuite context=TestRouterDHCPHosts>:teardown Error 235.21 test_router_dhcphosts.py

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 532

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cltgm, going to test this (starting) today

@blueorangutan
Copy link

Trillian test result (tid-1251)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 45827 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5144-t1251-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 86 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_invalid_upgrade_kubernetes_cluster Failure 3612.47 test_kubernetes_clusters.py
test_02_deploy_and_upgrade_kubernetes_cluster Failure 3607.19 test_kubernetes_clusters.py
test_03_deploy_and_scale_kubernetes_cluster Failure 0.05 test_kubernetes_clusters.py
test_04_basic_lifecycle_kubernetes_cluster Failure 0.05 test_kubernetes_clusters.py
test_05_delete_kubernetes_cluster Failure 0.05 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 0.04 test_kubernetes_clusters.py
test_08_deploy_and_upgrade_kubernetes_ha_cluster Failure 0.04 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 0.04 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 85.77 test_kubernetes_clusters.py

Copy link
Member

@rohityadavcloud rohityadavcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 based on Daan's remarks/testing

@rohityadavcloud rohityadavcloud merged commit cf0f1fe into apache:4.15 Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants