vmm: cpu_manager: massively accelerate .pause() by phip1611 · Pull Request #7290 · cloud-hypervisor/cloud-hypervisor

phip1611 · 2025-08-20T11:25:50Z

vmm: cpu_manager: massively accelerate .pause()

With 254 vCPUs, pausing now takes ~4ms instead of >254ms. This
improvement is visible when running ch-remote pause and is
particularly important for live migration, where every millisecond
of downtime matters.

For the wait logic, it is fine to stick to the approach of
sleeping 1ms on the first missed ACK as:

we have to wait anyway
we give time to the OS, enabling it to schedule a vCPU thread next

alyssais

Nice find!

vmm/src/cpu.rs

With 254 vCPUs, pausing now takes ~4ms instead of >254ms. This improvement is visible when running `ch-remote pause` and is particularly important for live migration, where every millisecond of downtime matters. For the wait logic, it is fine to stick to the approach of sleeping 1ms on the first missed ACK as: 1) we have to wait anyway 2) we give time to the OS, enabling it to schedule a vCPU thread next Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de> On-behalf-of: SAP philipp.schuster@sap.com

It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change from pause->run is also gracefully recognized by `CpuManager::resume()`. This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms. ## Reproducer `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` Since [0] is merged, this fix can be tested for example by modifying the pause() API call to run pause() and resume() in a loop a thousand times. With this change, things do not get stuck anymore. ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290

Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290

Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] cloud-hypervisor#7290 Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de> On-behalf-of: SAP philipp.schuster@sap.com

Fix a race condition that happens in resume()-pause() cycles. It is odd that for pause(), the CpuManager waited via `state.paused` for the vCPU thread to ACK the state change but not for `resume()`. In the `resume()` case, oddly CpuManager "owned" the state change in `state.paused`. This commit changes this so that the vCPU ACKs its state change itself in `state.paused` when it transitions from pause->run. Further, `CpuManager::resume()` now gracefully waits for the vCPU to be resumed. More technical: This change ensures proper synchronization and prevents situations in that park() follows right after unpark(), causing deadlocks and other weird behavior due to race conditions. Calling resume() now takes slightly longer, very similar to pause(). This is, however, even for 254 vCPUs in the range of less than 10ms, and ultimately we now have correct behaviour. ## Reproducer Since [0] is merged, the underlying problem can be tested without this commit by modifying the pause() API call to run `CpuManager::pause()` and `CpuManager::resume()` in a loop a thousand times. `ch-remote --api-socket ... pause` ```patch diff --git a/vmm/src/vm.rs b/vmm/src/vm.rs index d7bba25..35557d58f 100644 --- a/vmm/src/vm.rs +++ b/vmm/src/vm.rs @@ -2687,6 +2687,10 @@ impl Pausable for Vm { MigratableError::Pause(anyhow!("Error activating pending virtio devices: {:?}", e)) })?; + for _ in 0..1000 { + self.cpu_manager.lock().unwrap().pause()?; + self.cpu_manager.lock().unwrap().resume()?; + } self.cpu_manager.lock().unwrap().pause()?; self.device_manager.lock().unwrap().pause()?; ``` ## Outlook Decades of experience in VMM development showed us that using many AtomicBools is a footgun. They are not synchronized with each other at all. On the long term, we might want to refactor things to have a single shared AtomicU64 with different bits having different meanings. [0] #7290 Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de> On-behalf-of: SAP philipp.schuster@sap.com

phip1611 requested a review from a team as a code owner August 20, 2025 11:25

phip1611 force-pushed the accelerate-vcpus-pause branch 5 times, most recently from 454da2c to 7c6dffb Compare August 20, 2025 11:37

alyssais reviewed Aug 20, 2025

View reviewed changes

vmm/src/cpu.rs Outdated Show resolved Hide resolved

vmm/src/cpu.rs Outdated Show resolved Hide resolved

vmm/src/cpu.rs Outdated Show resolved Hide resolved

vmm/src/cpu.rs Show resolved Hide resolved

vmm/src/cpu.rs Outdated Show resolved Hide resolved

phip1611 force-pushed the accelerate-vcpus-pause branch from 7c6dffb to 59eabe4 Compare August 20, 2025 12:01

phip1611 requested a review from alyssais August 20, 2025 12:01

alyssais approved these changes Aug 20, 2025

View reviewed changes

alyssais enabled auto-merge August 20, 2025 12:48

alyssais added this pull request to the merge queue Aug 20, 2025

Merged via the queue into cloud-hypervisor:main with commit c1f4df6 Aug 20, 2025
40 checks passed

phip1611 deleted the accelerate-vcpus-pause branch August 20, 2025 13:44

likebreath moved this to ✅ Done in Cloud Hypervisor Roadmap Aug 20, 2025

likebreath added this to Cloud Hypervisor Roadmap Aug 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vmm: cpu_manager: massively accelerate .pause()#7290

vmm: cpu_manager: massively accelerate .pause()#7290
alyssais merged 1 commit intocloud-hypervisor:mainfrom
phip1611:accelerate-vcpus-pause

phip1611 commented Aug 20, 2025 •

edited

Loading

Uh oh!

alyssais left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

phip1611 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alyssais left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

phip1611 commented Aug 20, 2025 •

edited

Loading