Skip to content

Conversation

@DaanHoogland
Copy link
Contributor

Description

This PR cleans up the StoragePoolAutomationImpl class to reduce its two big methods and reuse some code.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@codecov
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

❌ Patch coverage is 0% with 151 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.17%. Comparing base (4d95f08) to head (c2d4461).
⚠️ Report is 30 commits behind head on 4.20.

Files with missing lines Patch % Lines
...a/com/cloud/storage/StoragePoolAutomationImpl.java 0.00% 151 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.20   #11789   +/-   ##
=========================================
  Coverage     16.17%   16.17%           
- Complexity    13297    13298    +1     
=========================================
  Files          5656     5656           
  Lines        498331   498326    -5     
  Branches      60476    60463   -13     
=========================================
  Hits          80591    80591           
+ Misses       408767   408763    -4     
+ Partials       8973     8972    -1     
Flag Coverage Δ
uitests 4.00% <ø> (ø)
unittests 17.02% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@DaanHoogland DaanHoogland changed the title refactor storapool automation refactor storepool automation Oct 6, 2025
@DaanHoogland DaanHoogland marked this pull request as ready for review October 7, 2025 07:18
Copy link
Contributor

@JoaoJandre JoaoJandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm, did not test it.

@DaanHoogland DaanHoogland requested a review from Copilot October 8, 2025 13:25
@DaanHoogland DaanHoogland added this to the 4.20.2 milestone Oct 8, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the StoragePoolAutomationImpl class by breaking down its two large methods (maintain and cancelMaintain) into smaller, more focused helper methods and consolidating duplicated code.

Key changes:

  • Extracted multiple helper methods from the large maintain and cancelMaintain methods
  • Consolidated VM handling logic into reusable methods
  • Removed unused imports and injected dependencies

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions
Copy link

github-actions bot commented Oct 9, 2025

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

1 similar comment
@github-actions
Copy link

github-actions bot commented Oct 9, 2025

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@RosiKyu
Copy link
Collaborator

RosiKyu commented Oct 13, 2025

@blueorangutan test

@blueorangutan
Copy link

@rosi-shapeblue a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-14651)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 57842 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11789-t14651-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_enableHumanReadableLogs Error 0.31 test_human_readable_logs.py

@RosiKyu
Copy link
Collaborator

RosiKyu commented Oct 15, 2025

1.Prepare & Cancel Maintenance on Primary Storage

Action: Put pool in maintenance → cancel.

Expected Result: Pool transitions Up → PrepareForMaintenance → Up cleanly, no errors.

Actual Result

  • Smooth transition: Up → PrepareForMaintenance → Up.
  • No ErrorInMaintenance.
  • No exceptions in management-server.log.
  • Pool fully usable after cancel.

Evidence

  • Pools are in Up state
(localcloud) 🐱 > list storagepools
{
  "count": 2,
  "storagepool": [
    {
      "clusterid": "46169705-e70d-46ff-ad6d-1cb31fc6423c",
      "clustername": "p1-c1",
      "created": "2025-10-15T13:38:55+0000",
      "disksizeallocated": 5243076688,
      "disksizetotal": 2898029182976,
      "disksizeused": 2158164443136,
      "hasannotations": false,
      "hypervisor": "KVM",
      "id": "cca423c8-b72e-36f2-8931-60434242a9ca",
      "ipaddress": "10.0.32.4",
      "istagarule": false,
      "managed": false,
      "name": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri1",
      "overprovisionfactor": "2.0",
      "path": "/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri1",
      "podid": "0e13ddcc-d5ef-444a-a358-0c6ac6e9c744",
      "podname": "Pod1",
      "provider": "DefaultPrimary",
      "scope": "CLUSTER",
      "state": "Up",
      "storagecapabilities": {
        "VOLUME_SNAPSHOT_QUIESCEVM": "false"
      },
      "type": "NetworkFilesystem",
      "zoneid": "36c95913-6c6d-400c-9d7e-12d8ac72fc35",
      "zonename": "ref-trl-9699-k-Mol8-rositsa-kyuchukova"
    },
    {
      "clusterid": "46169705-e70d-46ff-ad6d-1cb31fc6423c",
      "clustername": "p1-c1",
      "created": "2025-10-15T13:38:56+0000",
      "disksizeallocated": 5243076688,
      "disksizetotal": 2898029182976,
      "disksizeused": 2158164443136,
      "hasannotations": false,
      "hypervisor": "KVM",
      "id": "5e33de91-2078-34b9-b954-c6646408bf99",
      "ipaddress": "10.0.32.4",
      "istagarule": false,
      "managed": false,
      "name": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
      "overprovisionfactor": "2.0",
      "path": "/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
      "podid": "0e13ddcc-d5ef-444a-a358-0c6ac6e9c744",
      "podname": "Pod1",
      "provider": "DefaultPrimary",
      "scope": "CLUSTER",
      "state": "Up",
      "storagecapabilities": {
        "VOLUME_SNAPSHOT_QUIESCEVM": "false"
      },
      "type": "NetworkFilesystem",
      "zoneid": "36c95913-6c6d-400c-9d7e-12d8ac72fc35",
      "zonename": "ref-trl-9699-k-Mol8-rositsa-kyuchukova"
    }
  ]
}
  • Trigger prepare for maintenance -> Pool transitions to PrepareForMaintenance
(localcloud) 🐱 > enable storagemaintenance id=5e33de91-2078-34b9-b954-c6646408bf99
{
  "storagepool": {
    "clusterid": "46169705-e70d-46ff-ad6d-1cb31fc6423c",
    "clustername": "p1-c1",
    "created": "2025-10-15T13:38:56+0000",
    "disksizeallocated": 5243076688,
    "disksizetotal": 2898029182976,
    "disksizeused": 2161114087424,
    "hasannotations": false,
    "hypervisor": "KVM",
    "id": "5e33de91-2078-34b9-b954-c6646408bf99",
    "ipaddress": "10.0.32.4",
    "istagarule": false,
    "jobid": "6f934724-025d-44a5-9ad6-0c43f6fd8c2b",
    "jobstatus": 0,
    "managed": false,
    "name": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
    "overprovisionfactor": "2.0",
    "path": "/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
    "podid": "0e13ddcc-d5ef-444a-a358-0c6ac6e9c744",
    "podname": "Pod1",
    "provider": "DefaultPrimary",
    "scope": "CLUSTER",
    "state": "Maintenance",
    "type": "NetworkFilesystem",
    "zoneid": "36c95913-6c6d-400c-9d7e-12d8ac72fc35",
    "zonename": "ref-trl-9699-k-Mol8-rositsa-kyuchukova"
  }
}
  • Verify Pool state after command is triggered
(localcloud) 🐱 > list storagepools id=5e33de91-2078-34b9-b954-c6646408bf99 filter=state
{
  "count": 1,
  "storagepool": [
    {
      "state": "Maintenance"
    }
  ]
}
Screenshot from 2025-10-15 17-04-33
  • Cancel maintenance
(localcloud) 🐱 > cancel storagemaintenance id=5e33de91-2078-34b9-b954-c6646408bf99
{
  "storagepool": {
    "clusterid": "46169705-e70d-46ff-ad6d-1cb31fc6423c",
    "clustername": "p1-c1",
    "created": "2025-10-15T13:38:56+0000",
    "disksizeallocated": 196688,
    "disksizetotal": 2898029182976,
    "disksizeused": 2168474042368,
    "hasannotations": false,
    "hypervisor": "KVM",
    "id": "5e33de91-2078-34b9-b954-c6646408bf99",
    "ipaddress": "10.0.32.4",
    "istagarule": false,
    "jobid": "ff7026cb-ee1f-4520-bfc4-2a854452d05b",
    "jobstatus": 0,
    "managed": false,
    "name": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
    "overprovisionfactor": "2.0",
    "path": "/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
    "podid": "0e13ddcc-d5ef-444a-a358-0c6ac6e9c744",
    "podname": "Pod1",
    "provider": "DefaultPrimary",
    "scope": "CLUSTER",
    "state": "Up",
    "type": "NetworkFilesystem",
    "zoneid": "36c95913-6c6d-400c-9d7e-12d8ac72fc35",
    "zonename": "ref-trl-9699-k-Mol8-rositsa-kyuchukova"
  }
}
  • Verify Pool transitions back to Up
(localcloud) 🐱 > list storagepools id=5e33de91-2078-34b9-b954-c6646408bf99 filter=state
{
  "count": 1,
  "storagepool": [
    {
      "state": "Up"
    }
  ]
}
image
  • Destroy & recreate the CPVM
(localcloud) 🐱 > destroy systemvm id=4fc85d47-094f-47f5-8eb9-ecd4fb0753a3
{
  "systemvm": {
    "arch": "x86_64",
    "created": "2025-10-15T13:40:26+0000",
    "dns1": "10.0.32.1",
    "dns2": "8.8.8.8",
    "hasannotations": false,
    "hostcontrolstate": "Enabled",
    "hostid": "21507531-193b-4f7d-a34c-e4d7f1e764e2",
    "hostname": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2",
    "hypervisor": "KVM",
    "id": "4fc85d47-094f-47f5-8eb9-ecd4fb0753a3",
    "isdynamicallyscalable": false,
    "name": "v-1-VM",
    "podid": "0e13ddcc-d5ef-444a-a358-0c6ac6e9c744",
    "podname": "Pod1",
    "serviceofferingid": "0e9d510e-86ec-4b8d-971f-856c7a0ecbce",
    "serviceofferingname": "System Offering For Console Proxy",
    "state": "Running",
    "systemvmtype": "consoleproxy",
    "templateid": "33e2baf6-a9c9-11f0-9687-1e0035000318",
    "templatename": "SystemVM Template (KVM)",
    "zoneid": "36c95913-6c6d-400c-9d7e-12d8ac72fc35",
    "zonename": "ref-trl-9699-k-Mol8-rositsa-kyuchukova"
  }
}
Screenshot from 2025-10-15 17-38-30 Screenshot from 2025-10-15 17-39-23
  • CPVM volume confirmed on ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2. CPVM successfully recreated and reached Running state after maintenance cancel - pool operational.
[root@ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm1 ~]# virsh dumpxml v-3-VM | grep "source file"
      <source file='/mnt/5e33de91-2078-34b9-b954-c6646408bf99/02095b34-2f4a-4665-b4fe-c29373325dfc' index='2'/>
        <source file='/mnt/5e33de91-2078-34b9-b954-c6646408bf99/33e2baf6-a9c9-11f0-9687-1e0035000318'/>
[root@ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm1 ~]#

2.Host Heartbeat Removal / Re-Add

Action: Prepare & cancel maintenance on Pool, while monitoring logs.

Expected Result: Heartbeats removed on prepare (ModifyStoragePoolCommand(false)) and re-added on cancel (true), no failures.

Actual Result

  • add:false → heartbeat removal on prepare.
  • add:true → heartbeat re-add on cancel.
  • Commands sent successfully to both hosts (kvm1, kvm2).
  • No errors or failures.

Evidence

2025-10-15 13:38:57,006 DEBUG [c.c.a.t.Request] (pool-11-thread-1:[]) (logid:) Seq 1-6179783113682452493: Sending  { Cmd , MgmtId: 32986238026520, via: 1(ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":"true","pool":{"id":"2","uuid":"5e33de91-2078-34b9-b954-c6646408bf99","host":"10.0.32.4","path":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2","port":"2049","type":"NetworkFilesystem"},"localPath":"/mnt//5e33de91-2078-34b9-b954-c6646408bf99","wait":"300","bypassHostMaintenance":"false"}}] }
2025-10-15 13:38:57,155 DEBUG [c.c.a.t.Request] (pool-11-thread-1:[]) (logid:) Seq 2-1762596304162127882: Sending  { Cmd , MgmtId: 32986238026520, via: 2(ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":"true","pool":{"id":"2","uuid":"5e33de91-2078-34b9-b954-c6646408bf99","host":"10.0.32.4","path":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2","port":"2049","type":"NetworkFilesystem"},"localPath":"/mnt//5e33de91-2078-34b9-b954-c6646408bf99","wait":"300","bypassHostMaintenance":"false"}}] }
2025-10-15 14:03:38,642 DEBUG [c.c.a.t.Request] (API-Job-Executor-29:[ctx-1a84fa5f, job-35, ctx-80386708]) (logid:6f934724) Seq 1-6179783113682452572: Sending  { Cmd , MgmtId: 32986238026520, via: 1(ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":"false","pool":{"id":"2","uuid":"5e33de91-2078-34b9-b954-c6646408bf99","host":"10.0.32.4","path":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2","port":"2049","type":"NetworkFilesystem"},"localPath":"/mnt//5e33de91-2078-34b9-b954-c6646408bf99","wait":"0","bypassHostMaintenance":"false"}}] }
2025-10-15 14:03:38,682 DEBUG [c.c.a.t.Request] (API-Job-Executor-29:[ctx-1a84fa5f, job-35, ctx-80386708]) (logid:6f934724) Seq 2-1762596304162127959: Sending  { Cmd , MgmtId: 32986238026520, via: 2(ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":"false","pool":{"id":"2","uuid":"5e33de91-2078-34b9-b954-c6646408bf99","host":"10.0.32.4","path":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2","port":"2049","type":"NetworkFilesystem"},"localPath":"/mnt//5e33de91-2078-34b9-b954-c6646408bf99","wait":"0","bypassHostMaintenance":"false"}}] }
2025-10-15 14:11:19,134 DEBUG [c.c.a.t.Request] (API-Job-Executor-30:[ctx-f4e08cc2, job-38, ctx-8c9a2448]) (logid:ff7026cb) Seq 1-6179783113682452599: Sending  { Cmd , MgmtId: 32986238026520, via: 1(ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":"true","pool":{"id":"2","uuid":"5e33de91-2078-34b9-b954-c6646408bf99","host":"10.0.32.4","path":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2","port":"2049","type":"NetworkFilesystem"},"localPath":"/mnt//5e33de91-2078-34b9-b954-c6646408bf99","wait":"0","bypassHostMaintenance":"false"}}] }
2025-10-15 14:11:19,225 DEBUG [c.c.a.t.Request] (API-Job-Executor-30:[ctx-f4e08cc2, job-38, ctx-8c9a2448]) (logid:ff7026cb) Seq 2-1762596304162127986: Sending  { Cmd , MgmtId: 32986238026520, via: 2(ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":"true","pool":{"id":"2","uuid":"5e33de91-2078-34b9-b954-c6646408bf99","host":"10.0.32.4","path":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2","port":"2049","type":"NetworkFilesystem"},"localPath":"/mnt//5e33de91-2078-34b9-b954-c6646408bf99","wait":"0","bypassHostMaintenance":"false"}}] }

3.VM Stop/Start Behavior (System & User VMs)

Action:

  • Deploy a user VM while Primary Storage 1(pri1) is in maintenance, ensuring CPVM, SSVM, and the user VM are running on Primary Storage 2 (pri2).
  • Trigger maintenance on pri2 while no other pool is active to force system and user VMs to stop.
  • Re-enable pri1 and verify automatic redeployment of system VMs and manual start of the user VM on pri1.
  • Cancel maintenance on pri2 to restore normal pool availability.

Expected Result: VMs stop or migrate as expected, system VMs restart automatically, no orphaned resources.

  • All system and user VMs initially run on pri2.
  • When pri2 enters maintenance with no alternate pool active, CPVM, SSVM, and the user VM stop gracefully.
  • After pri1 is re-enabled, system VMs are automatically redeployed on pri1, and the user VM can be manually started without errors.
  • All volumes now reside on pri1.
  • No failures or stuck states are observed in management-server.log.

Actual Result

  • With Primary Storage 1 (pri1) disabled, all system and user VMs were confirmed to be using Primary Storage 2 (pri2) as their storage pool.
  • When pri2 was put into maintenance, both system VMs (CPVM and SSVM) and the user VM stopped cleanly without errors.
  • After re-enabling pri1, system VMs were automatically redeployed on pri1 (verified via virsh dumpxml disk paths).
  • The user VM was successfully started manually on pri1 while pri2 remained in maintenance.
  • Canceling maintenance on pri2 restored the pool to Up state without issues.
  • No failures or unexpected behavior were observed in the management server logs or UI.
  • CPVM and SSVM were fully functional after redeploy and recreation, confirming successful recovery flow.

Evidence

  • only pri2 is available
(localcloud) 🐱 > list storagepools filter=state,name,id
{
  "count": 2,
  "storagepool": [
    {
      "id": "cca423c8-b72e-36f2-8931-60434242a9ca",
      "name": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri1",
      "state": "Disabled"
    },
    {
      "id": "5e33de91-2078-34b9-b954-c6646408bf99",
      "name": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
      "state": "Up"
    }
  ]
}
  • Storage Pool for CPVM is on pri2 (pri2 ID: 5e33de91-2078-34b9-b954-c6646408bf99)
[root@ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm1 ~]# virsh dumpxml v-3-VM | grep "source"
      <source file='/mnt/5e33de91-2078-34b9-b954-c6646408bf99/02095b34-2f4a-4665-b4fe-c29373325dfc' index='2'/>
        <source file='/mnt/5e33de91-2078-34b9-b954-c6646408bf99/33e2baf6-a9c9-11f0-9687-1e0035000318'/>
  • Storage Pool for SSVM is on pri2 (pri2 ID: 5e33de91-2078-34b9-b954-c6646408bf99)
[root@ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2 ~]# virsh dumpxml s-4-VM | grep source 
     <source file='/mnt/5e33de91-2078-34b9-b954-c6646408bf99/c7e1a2d2-0cf7-4f64-92dc-44085a2b81ee' index='2'/> 
       <source file='/mnt/5e33de91-2078-34b9-b954-c6646408bf99/33e2baf6-a9c9-11f0-9687-1e0035000318'/>
  • Storage placement on the deployed VM is on pri2 (pri2 ID: 5e33de91-2078-34b9-b954-c6646408bf99):
[root@ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2 ~]# virsh dumpxml i-2-5-VM | grep "source"
      <source file='/mnt/5e33de91-2078-34b9-b954-c6646408bf99/4093c1d5-7b51-4f31-b16c-465398d41b44' index='2'/>
        <source file='/mnt/5e33de91-2078-34b9-b954-c6646408bf99/33e3a5c0-a9c9-11f0-9687-1e0035000318'/>
  • Trigger maintenance on pri2
(localcloud) 🐱 > enable storagemaintenance id=5e33de91-2078-34b9-b954-c6646408bf99
{
  "storagepool": {
    "clusterid": "46169705-e70d-46ff-ad6d-1cb31fc6423c",
    "clustername": "p1-c1",
    "created": "2025-10-15T13:38:56+0000",
    "disksizeallocated": 24318968016,
    "disksizetotal": 2898029182976,
    "disksizeused": 2201851265024,
    "hasannotations": false,
    "hypervisor": "KVM",
    "id": "5e33de91-2078-34b9-b954-c6646408bf99",
    "ipaddress": "10.0.32.4",
    "istagarule": false,
    "jobid": "42925036-5491-47d1-819b-1388fb62c8e3",
    "jobstatus": 0,
    "managed": false,
    "name": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
    "overprovisionfactor": "2.0",
    "path": "/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
    "podid": "0e13ddcc-d5ef-444a-a358-0c6ac6e9c744",
    "podname": "Pod1",
    "provider": "DefaultPrimary",
    "scope": "CLUSTER",
    "state": "Maintenance",
    "type": "NetworkFilesystem",
    "zoneid": "36c95913-6c6d-400c-9d7e-12d8ac72fc35",
    "zonename": "ref-trl-9699-k-Mol8-rositsa-kyuchukova"
  }
}
image
  • User VM & System VMs are Stopped
Screenshot from 2025-10-15 22-15-50 Screenshot from 2025-10-15 22-15-39
  • Re-enable pri1 so system VMs can redeploy
image
  • System VMs start redeploying on pri1
    NOTE Destroy and re-create them, if needed
image
  • Verify Storage Pool for both System VMs is now pri1, since pri2 is in Maintenance (pri1 ID: cca423c8-b72e-36f2-8931-60434242a9ca)
[root@ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2 ~]# virsh dumpxml s-15-VM | grep source
      <source file='/mnt/cca423c8-b72e-36f2-8931-60434242a9ca/58a37398-2021-439f-8d27-36a7cac8b99b' index='2'/>
        <source file='/mnt/cca423c8-b72e-36f2-8931-60434242a9ca/33e2baf6-a9c9-11f0-9687-1e0035000318'/>

and

[root@ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm1 ~]# virsh dumpxml v-14-VM | grep source
  <resource>
  </resource>
      <source file='/mnt/cca423c8-b72e-36f2-8931-60434242a9ca/2cb750a3-8005-4998-ac5b-24913dab493f' index='2'/>
        <source file='/mnt/cca423c8-b72e-36f2-8931-60434242a9ca/33e2baf6-a9c9-11f0-9687-1e0035000318'/>
  • Manually start the User VM (test-vm-pri2)
image
  • VM starts on pri1 (while pri2 is still in Maintenance)
image image
  • Cancel maintenance on pri2 (to restore baseline):
(localcloud) 🐱 > cancel storagemaintenance id=5e33de91-2078-34b9-b954-c6646408bf99
{
  "storagepool": {
    "clusterid": "46169705-e70d-46ff-ad6d-1cb31fc6423c",
    "clustername": "p1-c1",
    "created": "2025-10-15T13:38:56+0000",
    "disksizeallocated": 13833208016,
    "disksizetotal": 2898029182976,
    "disksizeused": 2208384942080,
    "hasannotations": false,
    "hypervisor": "KVM",
    "id": "5e33de91-2078-34b9-b954-c6646408bf99",
    "ipaddress": "10.0.32.4",
    "istagarule": false,
    "jobid": "17330882-91f9-4b90-a339-20b6fe15c80b",
    "jobstatus": 0,
    "managed": false,
    "name": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
    "overprovisionfactor": "2.0",
    "path": "/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
    "podid": "0e13ddcc-d5ef-444a-a358-0c6ac6e9c744",
    "podname": "Pod1",
    "provider": "DefaultPrimary",
    "scope": "CLUSTER",
    "state": "Up",
    "type": "NetworkFilesystem",
    "zoneid": "36c95913-6c6d-400c-9d7e-12d8ac72fc35",
    "zonename": "ref-trl-9699-k-Mol8-rositsa-kyuchukova"
  }
}
image

4.VM-HA Scenario

Action: Enable HA on a user VM → prepare pool for maintenance.

Expected Result: VM automatically restarts on the other pool, stable after cancel.

Actual Result

Evidence

  1. Baseline:
  • User VM is in Running state
(localcloud) 🐱 > list virtualmachines id=4848b36a-bb8c-438b-a93e-82f613a8d6d0 filter=name,instancename,state,hostname
{
  "count": 1,
  "virtualmachine": [
    {
      "hostname": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2",
      "instancename": "i-2-16-VM",
      "name": "test-vm-ha",
      "state": "Running"
    }
  ]
}
  • HA enabled
image
  • Primary Storage 1 and Primary Storage 2 are in Up state
(localcloud) 🐱 > list storagepools filter=name,id,state
{
  "count": 2,
  "storagepool": [
    {
      "id": "cca423c8-b72e-36f2-8931-60434242a9ca",
      "name": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri1",
      "state": "Up"
    },
    {
      "id": "5e33de91-2078-34b9-b954-c6646408bf99",
      "name": "ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri2",
      "state": "Up"
    }
  ]
}
  • User VM hosted on Primary Storage 1 (ID: cca423c8-b72e-36f2-8931-60434242a9ca)
[root@ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2 ~]# virsh dumpxml i-2-16-VM | grep source
      <source file='/mnt/cca423c8-b72e-36f2-8931-60434242a9ca/a82f054a-8449-4e41-9dab-7aed00033820' index='2'/>
        <source file='/mnt/cca423c8-b72e-36f2-8931-60434242a9ca/33e3a5c0-a9c9-11f0-9687-1e0035000318'/>

  1. Enable Maintenance on Primary Storage 1
image image image image image
  1. Verify that the User VM without HA enabled - is Stopped, while the VM with HA enabled - automatically starts on the other pool (pri2)

-> Both VMs remain in Stopped state

Duplicate Key Errors occurred for BOTH VMs:

2025-10-15 21:42:31,610 ERROR [c.c.s.d.StoragePoolWorkDaoImpl] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) DB Exception on: HikariProxyPreparedStatement@962471373 wrapping com.mysql.cj.jdbc.ServerPreparedStatement[940]: INSERT INTO storage_pool_work (storage_pool_work.pool_id, storage_pool_work.vm_id, storage_pool_work.stopped_for_maintenance, storage_pool_work.started_after_maintenance, storage_pool_work.mgmt_server_id) VALUES (1, 5, 0, 0, 32986238026520) java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '1-5' for key 'storage_pool_work.pool_id'
2025-10-15 21:42:31,619 ERROR [c.c.s.d.StoragePoolWorkDaoImpl] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) DB Exception on: HikariProxyPreparedStatement@2011553358 wrapping com.mysql.cj.jdbc.ServerPreparedStatement[940]: INSERT INTO storage_pool_work (storage_pool_work.pool_id, storage_pool_work.vm_id, storage_pool_work.stopped_for_maintenance, storage_pool_work.started_after_maintenance, storage_pool_work.mgmt_server_id) VALUES (1, 16, 0, 0, 32986238026520) java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '1-16' for key 'storage_pool_work.pool_id'

Fulll log:

[root@ref-trl-9699-k-Mol8-rositsa-kyuchukova-mgmt1 ~]# tail -f /var/log/cloudstack/management/management-server.log | grep -E "ModifyStoragePool|StoragePoolWork|PrepareForMaintenance|cca423c8"


2025-10-15 21:42:31,465 DEBUG [c.c.a.ApiServlet] (qtp253011924-16:[ctx-a39b4530]) (logid:3fa1d2fc) ===START===  10.0.3.251 -- GET  id=cca423c8-b72e-36f2-8931-60434242a9ca&command=enableStorageMaintenance&response=json&sessionkey=Y25ZLjhJ2abh2eviLYaMNV8zsNg
2025-10-15 21:42:31,497 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (qtp253011924-16:[ctx-a39b4530, ctx-cb6eee94]) (logid:3fa1d2fc) submit async job-115, details: AsyncJob {"accountId":2,"cmd":"org.apache.cloudstack.api.command.admin.storage.PreparePrimaryStorageForMaintenanceCmd","cmdInfo":"{\"response\":\"json\",\"ctxUserId\":\"2\",\"sessionkey\":\"Y25ZLjhJ2abh2eviLYaMNV8zsNg\",\"httpmethod\":\"GET\",\"ctxStartEventId\":\"280\",\"id\":\"cca423c8-b72e-36f2-8931-60434242a9ca\",\"ctxDetails\":\"{\\\"interface com.cloud.storage.StoragePool\\\":\\\"cca423c8-b72e-36f2-8931-60434242a9ca\\\"}\",\"ctxAccountId\":\"2\",\"uuid\":\"cca423c8-b72e-36f2-8931-60434242a9ca\",\"cmdEventType\":\"MAINT.PREPARE.PS\"}","cmdVersion":0,"completeMsid":null,"created":null,"id":115,"initMsid":32986238026520,"instanceId":1,"instanceType":"StoragePool","lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":2,"uuid":"e5f8a828-8eb2-4696-97e6-32a3fa7856ec"}
2025-10-15 21:42:31,499 DEBUG [c.c.a.ApiServlet] (qtp253011924-16:[ctx-a39b4530, ctx-cb6eee94]) (logid:3fa1d2fc) ===END===  10.0.3.251 -- GET  id=cca423c8-b72e-36f2-8931-60434242a9ca&command=enableStorageMaintenance&response=json&sessionkey=Y25ZLjhJ2abh2eviLYaMNV8zsNg
2025-10-15 21:42:31,499 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl$5] (API-Job-Executor-49:[ctx-42fca5c5, job-115]) (logid:e5f8a828) Executing AsyncJob {"accountId":2,"cmd":"org.apache.cloudstack.api.command.admin.storage.PreparePrimaryStorageForMaintenanceCmd","cmdInfo":"{\"response\":\"json\",\"ctxUserId\":\"2\",\"sessionkey\":\"Y25ZLjhJ2abh2eviLYaMNV8zsNg\",\"httpmethod\":\"GET\",\"ctxStartEventId\":\"280\",\"id\":\"cca423c8-b72e-36f2-8931-60434242a9ca\",\"ctxDetails\":\"{\\\"interface com.cloud.storage.StoragePool\\\":\\\"cca423c8-b72e-36f2-8931-60434242a9ca\\\"}\",\"ctxAccountId\":\"2\",\"uuid\":\"cca423c8-b72e-36f2-8931-60434242a9ca\",\"cmdEventType\":\"MAINT.PREPARE.PS\"}","cmdVersion":0,"completeMsid":null,"created":null,"id":115,"initMsid":32986238026520,"instanceId":1,"instanceType":"StoragePool","lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":2,"uuid":"e5f8a828-8eb2-4696-97e6-32a3fa7856ec"}
2025-10-15 21:42:31,524 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) Wait time setting on com.cloud.agent.api.ModifyStoragePoolCommand is 1800 seconds
2025-10-15 21:42:31,525 DEBUG [c.c.a.t.Request] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) Seq 1-6179783113682454219: Sending  { Cmd , MgmtId: 32986238026520, via: 1(ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":"false","pool":{"id":"1","uuid":"cca423c8-b72e-36f2-8931-60434242a9ca","host":"10.0.32.4","path":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri1","port":"2049","type":"NetworkFilesystem"},"localPath":"/mnt//cca423c8-b72e-36f2-8931-60434242a9ca","wait":"0","bypassHostMaintenance":"false"}}] }
2025-10-15 21:42:31,582 DEBUG [c.c.a.t.Request] (AgentManager-Handler-1:[]) (logid:) Seq 1-6179783113682454219: Processing:  { Ans: , MgmtId: 32986238026520, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.ModifyStoragePoolAnswer":{"poolInfo":{"host":"10.0.32.4","localPath":"/mnt//cca423c8-b72e-36f2-8931-60434242a9ca","hostPath":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri1","poolType":"NetworkFilesystem","capacityBytes":"(2.6357 TB) 2898029182976","availableBytes":"(633.45 GB) 680159870976"},"templateInfo":{},"datastoreClusterChildren":[],"result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2025-10-15 21:42:31,582 DEBUG [c.c.a.t.Request] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) Seq 1-6179783113682454219: Received:  { Ans: , MgmtId: 32986238026520, via: 1(ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm1), Ver: v1, Flags: 10, { ModifyStoragePoolAnswer } }
2025-10-15 21:42:31,582 DEBUG [c.c.s.StoragePoolAutomationImpl] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) ModifyStoragePool succeeded for removing
2025-10-15 21:42:31,583 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) Wait time setting on com.cloud.agent.api.ModifyStoragePoolCommand is 1800 seconds
2025-10-15 21:42:31,585 DEBUG [c.c.a.t.Request] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) Seq 2-1762596304162129058: Sending  { Cmd , MgmtId: 32986238026520, via: 2(ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":"false","pool":{"id":"1","uuid":"cca423c8-b72e-36f2-8931-60434242a9ca","host":"10.0.32.4","path":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri1","port":"2049","type":"NetworkFilesystem"},"localPath":"/mnt//cca423c8-b72e-36f2-8931-60434242a9ca","wait":"0","bypassHostMaintenance":"false"}}] }
2025-10-15 21:42:31,605 DEBUG [c.c.a.t.Request] (AgentManager-Handler-4:[]) (logid:) Seq 2-1762596304162129058: Processing:  { Ans: , MgmtId: 32986238026520, via: 2, Ver: v1, Flags: 10, [{"com.cloud.agent.api.ModifyStoragePoolAnswer":{"poolInfo":{"host":"10.0.32.4","localPath":"/mnt//cca423c8-b72e-36f2-8931-60434242a9ca","hostPath":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri1","poolType":"NetworkFilesystem","capacityBytes":"(2.6357 TB) 2898029182976","availableBytes":"(619.93 GB) 665644433408"},"templateInfo":{},"datastoreClusterChildren":[],"result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2025-10-15 21:42:31,605 DEBUG [c.c.a.t.Request] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) Seq 2-1762596304162129058: Received:  { Ans: , MgmtId: 32986238026520, via: 2(ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm2), Ver: v1, Flags: 10, { ModifyStoragePoolAnswer } }
2025-10-15 21:42:31,605 DEBUG [c.c.s.StoragePoolAutomationImpl] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) ModifyStoragePool succeeded for removing
2025-10-15 21:42:31,610 ERROR [c.c.s.d.StoragePoolWorkDaoImpl] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) DB Exception on: HikariProxyPreparedStatement@962471373 wrapping com.mysql.cj.jdbc.ServerPreparedStatement[940]: INSERT INTO storage_pool_work (storage_pool_work.pool_id, storage_pool_work.vm_id, storage_pool_work.stopped_for_maintenance, storage_pool_work.started_after_maintenance, storage_pool_work.mgmt_server_id) VALUES (1, 5, 0, 0, 32986238026520) java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '1-5' for key 'storage_pool_work.pool_id'
2025-10-15 21:42:31,619 ERROR [c.c.s.d.StoragePoolWorkDaoImpl] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) DB Exception on: HikariProxyPreparedStatement@2011553358 wrapping com.mysql.cj.jdbc.ServerPreparedStatement[940]: INSERT INTO storage_pool_work (storage_pool_work.pool_id, storage_pool_work.vm_id, storage_pool_work.stopped_for_maintenance, storage_pool_work.started_after_maintenance, storage_pool_work.mgmt_server_id) VALUES (1, 16, 0, 0, 32986238026520) java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '1-16' for key 'storage_pool_work.pool_id'
2025-10-15 21:44:22,363 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-49:[ctx-42fca5c5, job-115, ctx-e2efdf6f]) (logid:e5f8a828) Complete async job-115, jobStatus: SUCCEEDED, resultCode: 0, result: org.apache.cloudstack.api.response.StoragePoolResponse/storagepool/{"id":"cca423c8-b72e-36f2-8931-60434242a9ca","zoneid":"36c95913-6c6d-400c-9d7e-12d8ac72fc35","zonename":"ref-trl-9699-k-Mol8-rositsa-kyuchukova","podid":"0e13ddcc-d5ef-444a-a358-0c6ac6e9c744","podname":"Pod1","name":"ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri1","ipaddress":"10.0.32.4","path":"/acs/primary/ref-trl-9699-k-Mol8-rositsa-kyuchukova/ref-trl-9699-k-Mol8-rositsa-kyuchukova-kvm-pri1","created":"2025-10-15T13:38:55+0000","type":"NetworkFilesystem","clusterid":"46169705-e70d-46ff-ad6d-1cb31fc6423c","clustername":"p1-c1","disksizetotal":"2898029182976","disksizeallocated":"17180262608","disksizeused":"2217593536512","istagarule":"false","state":"Maintenance","scope":"CLUSTER","overprovisionfactor":"2.0","hypervisor":"KVM","provider":"DefaultPrimary","managed":"false","hasannotations":"false","jobid":"e5f8a828-8eb2-4696-97e6-32a3fa7856ec","jobstatus":"0"}

Both VMs were stopped but NOT restarted:

i-2-5-VM (non-HA): stopped_for_maintenance=1, started_after_maintenance=0 (Expected)
i-2-16-VM (HA VM): stopped_for_maintenance=1, started_after_maintenance=0 (Not Expected, Should've started)

DB check: Old work records were never cleaned up: (?)

Rows 1-5: Old records from pool_id=2 (previous tests)
Rows 6-11: Current test records for pool_id=1

mysql> SELECT * FROM storage_pool_work;
+----+---------+-------+-------------------------+---------------------------+----------------+
| id | pool_id | vm_id | stopped_for_maintenance | started_after_maintenance | mgmt_server_id |
+----+---------+-------+-------------------------+---------------------------+----------------+
|  1 |       2 |     2 |                       1 |                         1 | 32986238026520 |
|  2 |       2 |     3 |                       1 |                         0 | 32986238026520 |
|  3 |       2 |     4 |                       1 |                         0 | 32986238026520 |
|  4 |       2 |     5 |                       1 |                         1 | 32986238026520 |
|  5 |       2 |     6 |                       1 |                         1 | 32986238026520 |
|  6 |       1 |    14 |                       1 |                         1 | 32986238026520 |
|  7 |       1 |    15 |                       1 |                         1 | 32986238026520 |
|  8 |       1 |     6 |                       1 |                         1 | 32986238026520 |
|  9 |       1 |     5 |                       1 |                         0 | 32986238026520 |
| 11 |       1 |    16 |                       1 |                         0 | 32986238026520 |
+----+---------+-------+-------------------------+---------------------------+----------------+

@weizhouapache weizhouapache modified the milestones: 4.20.2, 4.20.3 Oct 23, 2025
@apache apache deleted a comment from blueorangutan Dec 8, 2025
@apache apache deleted a comment from blueorangutan Dec 8, 2025
@apache apache deleted a comment from blueorangutan Dec 8, 2025
@apache apache deleted a comment from blueorangutan Dec 8, 2025
@apache apache deleted a comment from blueorangutan Dec 8, 2025
@apache apache deleted a comment from blueorangutan Dec 8, 2025
@apache apache deleted a comment from blueorangutan Dec 8, 2025
@apache apache deleted a comment from blueorangutan Dec 8, 2025
@apache apache deleted a comment from blueorangutan Dec 8, 2025
@DaanHoogland
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15949

@DaanHoogland
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@apache apache deleted a comment from blueorangutan Dec 9, 2025
@RosiKyu
Copy link
Collaborator

RosiKyu commented Dec 9, 2025

@DaanHoogland , I tested the HA scenario on 4.20.2

Action: Enable HA on a user VM → prepare primary storage pool for maintenance.
Expected Result: HA VM automatically restarts on the alternate storage pool, stable after cancel maintenance.
Actual Result: HA VM remains in Stopped state - does NOT auto-restart on alternate pool.

This is not a regression, caused by the PR.

Evidence

Baseline: Environment Setup

Zone, Cluster, and Hosts confirmed operational:

Primary Storage 1 and Primary Storage 2 are in Up state:

(localcloud) 🐱 > list storagepools filter=name,id,state
{
  "count": 2,
  "storagepool": [
    {
      "id": "1383bf9b-b82e-3566-9bca-769b4b69227d",
      "name": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm-pri1",
      "state": "Up"
    },
    {
      "id": "d2d00507-fb67-3329-a32d-775c637a0ba3",
      "name": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm-pri2",
      "state": "Up"
    }
  ]
}

Database clean - no stale records:

mysql> SELECT * FROM storage_pool_work;
Empty set (0.00 sec)

Test Setup: Created HA-enabled Service Offering and Deployed Test VMs

(localcloud) 🐱 > create serviceoffering name=Small-HA-Instance displaytext="Small Instance with HA" cpunumber=1 cpuspeed=1000 memory=512 offerha=true
{
  "serviceoffering": {
    "id": "e601bccc-9446-4c31-bae5-5f873bb6c712",
    "name": "Small-HA-Instance",
    "offerha": true
  }
}

Deployed two VMs - one with HA, one without:

(localcloud) 🐱 > list virtualmachines filter=name,id,instancename,state,hostname,haenable
{
  "count": 2,
  "virtualmachine": [
    {
      "haenable": false,
      "hostname": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2",
      "id": "b65f682e-08de-4e8f-afce-004f4d4054f2",
      "instancename": "i-2-3-VM",
      "name": "test-vm-no-ha",
      "state": "Running"
    },
    {
      "haenable": true,
      "hostname": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2",
      "id": "b7cffaf4-04c6-4c2a-9bd5-2c2ac7fe03df",
      "instancename": "i-2-5-VM",
      "name": "test-vm-ha",
      "state": "Running"
    }
  ]
}

Pre-Maintenance: VM Storage Location

HA VM (i-2-5-VM) hosted on Primary Storage 1 (ID: 1383bf9b-b82e-3566-9bca-769b4b69227d):

[root@ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2 ~]# virsh dumpxml i-2-5-VM | grep "source file"
      <source file='/mnt/1383bf9b-b82e-3566-9bca-769b4b69227d/45e68b66-c706-43e9-82bb-5e421411d072' index='2'/>
        <source file='/mnt/1383bf9b-b82e-3566-9bca-769b4b69227d/a4658949-d4f9-11f0-b96d-1e001100042d'/>

Non-HA VM (i-2-3-VM) hosted on Primary Storage 2 (ID: d2d00507-fb67-3329-a32d-775c637a0ba3):

[root@ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2 ~]# virsh dumpxml i-2-3-VM | grep "source file"
      <source file='/mnt/d2d00507-fb67-3329-a32d-775c637a0ba3/21b3676b-993d-4245-ad01-73de2afaba51' index='2'/>
        <source file='/mnt/d2d00507-fb67-3329-a32d-775c637a0ba3/a4658949-d4f9-11f0-b96d-1e001100042d'/>

Enabled Maintenance on Primary Storage 1

(localcloud) 🐱 > enableStorageMaintenance id=1383bf9b-b82e-3566-9bca-769b4b69227d
{
  "storagepool": {
    "id": "1383bf9b-b82e-3566-9bca-769b4b69227d",
    "jobid": "e2253689-fe35-4967-9651-a43016484f8a",
    "jobstatus": 0,
    "name": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm-pri1",
    "state": "Maintenance"
  }
}

Post-Maintenance Results

Storage Pool State:

(localcloud) 🐱 > list storagepools filter=name,id,state
{
  "count": 2,
  "storagepool": [
    {
      "id": "1383bf9b-b82e-3566-9bca-769b4b69227d",
      "name": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm-pri1",
      "state": "Maintenance"
    },
    {
      "id": "d2d00507-fb67-3329-a32d-775c637a0ba3",
      "name": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm-pri2",
      "state": "Up"
    }
  ]
}

VM States - HA VM did NOT auto-restart:

(localcloud) 🐱 > list virtualmachines filter=name,instancename,state,hostname,haenable
{
  "count": 2,
  "virtualmachine": [
    {
      "haenable": false,
      "hostname": "ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2",
      "instancename": "i-2-3-VM",
      "name": "test-vm-no-ha",
      "state": "Running"
    },
    {
      "haenable": true,
      "instancename": "i-2-5-VM",
      "name": "test-vm-ha",
      "state": "Stopped"
    }
  ]
}

HA VM not running on hypervisor:

[root@ref-trl-10333-k-Mol8-rositsa-kyuchukova-kvm2 ~]# virsh list --all
 Id   Name       State
--------------------------
 1    v-2-VM     running
 2    r-4-VM     running
 3    i-2-3-VM   running

Database Evidence:

mysql> SELECT * FROM storage_pool_work;
+----+---------+-------+-------------------------+---------------------------+----------------+
| id | pool_id | vm_id | stopped_for_maintenance | started_after_maintenance | mgmt_server_id |
+----+---------+-------+-------------------------+---------------------------+----------------+
|  1 |       1 |     1 |                       1 |                         1 | 32985634047021 |
|  2 |       1 |     5 |                       1 |                         0 | 32985634047021 |
+----+---------+-------+-------------------------+---------------------------+----------------+
2 rows in set (0.00 sec)

vm_id=5 (HA VM i-2-5-VM): stopped_for_maintenance=1, started_after_maintenance=0 ✗ Should be 1

Copy link
Collaborator

@RosiKyu RosiKyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Test results can be found here: #11789 (comment)

NOTE The HA issue, noted in the test results is not a regression, caused by this PR

@blueorangutan
Copy link

[SF] Trillian test result (tid-14929)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 55375 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11789-t14929-kvm-ol8.zip
Smoke tests completed. 134 look OK, 7 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_DeployVmAntiAffinityGroup_in_project Error 104.29 test_affinity_groups_projects.py
test_DeployVmAntiAffinityGroup Error 38.73 test_affinity_groups.py
test_01_host_tags Error 62.84 test_host_tags.py
test_03_deploy_and_scale_kubernetes_cluster Failure 38.23 test_kubernetes_clusters.py
test_08_upgrade_kubernetes_ha_cluster Failure 0.07 test_kubernetes_clusters.py
test_01_non_strict_host_anti_affinity Failure 152.16 test_nonstrict_affinity_group.py
test_02_non_strict_host_affinity Error 89.81 test_nonstrict_affinity_group.py
ContextSuite context=TestMigrateVMStrictTags>:setup Error 0.00 test_vm_strict_host_tags.py
test_hostha_enable_ha_when_host_in_maintenance Error 304.96 test_hostha_kvm.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-14956)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 54195 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11789-t14956-kvm-ol8.zip
Smoke tests completed. 139 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_uservm_host_control_state Failure 16.86 test_host_control_state.py
ContextSuite context=TestHostControlState>:teardown Error 31.43 test_host_control_state.py
test_01_secure_vm_migration Error 235.88 test_vm_life_cycle.py
test_01_secure_vm_migration Error 235.89 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 0.02 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 0.01 test_vm_life_cycle.py
test_04_nonsecured_to_secured_vm_migration Error 0.02 test_vm_life_cycle.py

@DaanHoogland
Copy link
Contributor Author

different errors on two consecutive runs. Calling those noise neighbour artifacts.

@DaanHoogland DaanHoogland merged commit 79ebf69 into apache:4.20 Dec 11, 2025
25 of 26 checks passed
@DaanHoogland DaanHoogland deleted the storagePoolAutomationRefactors branch December 11, 2025 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants