Skip to content

VM migration with disks fails between local storage hypervisors if the VM template is deleted #10246

@phsm

Description

@phsm

problem

When trying to migrate a virtual machine between two local storage hypervisors, the migration fails if the VM template is removed.

I did some reasearch, and have figured out the possible root cause of it.

First, the template is not found here: StorageSystemDataMotionStrategy.java:2158
Therefore, the system thinks this VM does not have a template.

As the result, the system thinks this VM does not have any backing files, and tries to perform full clone migration: MigrateKVMAsync.java:128. This seems to be the right migration option, the documentation says it will perform the migration of all the VMs disks: VIR_MIGRATE_NON_SHARED_INC
But there is a catch: the QCOW2 images have to be precreated on the destination machine with the correct size.

There is a method that would copy the template from the secondary storage to the destination host: KvmNonManagedStorageDataMotionStrategy.java:220, but there is no method to just create a QCOW2 file with the specific size at the specific path.

versions

4.20

The steps to reproduce the bug

  1. Take two KVM hypervisors with local storage pools.
  2. Create a virtual machine from a template on one of them.
  3. Delete this template after the test virtual machine was started.
  4. Try to migrate the virtual machine with its storage to the other hypervisor.
  5. It will fail with an error telling that the backing file does not exist on the target machine:
2025-01-22 12:41:08,662 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-1:[ctx-9e293509, job-226]) (logid:61e8bd24) Unexpected exception while executing org.apache.cloudstack.api.command.admin.vm.MigrateVirtualMachineWithVolumeCmd com.cloud.utils.exception.CloudRuntimeException: Failed to migrate VM [VM instance {"id":18,"instanceName":"i-2-18-VM","type":"User","uuid":"d3b043ff-716a-4bc8-b20e-639853b0babe"}] along with its volumes due to [com.cloud.utils.exception.CloudRuntimeException: Copy volume(s) to storage(s) [{volume: "33", from: "3", to:"4"}] and VM to host [{vm: "18", from: "2", to:"4"}] failed in StorageSystemDataMotionStrategy.copyAsync. Error message: [Exception during migrate: org.libvirt.LibvirtException: Path '/var/lib/libvirt/images/bb51c27d-ca94-4cda-a74e-cec07295ce26' is not accessible: No such file or directory].].
	at org.apache.cloudstack.engine.orchestration.VolumeOrchestrator.migrateVolumes(VolumeOrchestrator.java:1444)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
....

What to do about it?

It would be nice to fix the storage migration when the VM template was removed.

I would like to try to develop a fix myself but my knowledge of Cloudstack internals is not enough to know where to look.
I'd need some guidance how to properly develop the fix, for example point me to the right files where I can look.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions