Fix WAL data loss on instance hibernation/shutdown (#10341) by abhayclasher · Pull Request #10352 · cloudnative-pg/cloudnative-pg

abhayclasher · 2026-03-22T03:56:17Z

While looking at the WAL archiving logic, I noticed that when a CNPG instance shuts down normally, any final WAL segments sitting in .ready state never get pushed to S3. The archiver only runs during switchover/demotion, not during regular shutdown. This is a problem especially during cluster hibernation — you lose the last batch of transactions.

The fix is straightforward: call ArchiveAllReadyWALs() after the PostgreSQL shutdown completes, same way we do for switchover. Ensures those final segments don't sit orphaned on local storage.

Fixes #10341

github-actions · 2026-03-22T03:56:29Z

❗ By default, the pull request is configured to backport to all release branches.

To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

When a CNPG instance is shut down (e.g. during hibernation), the last WAL segments might remain on local storage and not be archived to the object store. This is because ArchiveAllReadyWALs() was only triggered during switchover/demotion. This commit ensures that ArchiveAllReadyWALs() is also called after a successful smart/fast shutdown in the lifecycle loop, preventing data loss of the final WAL segments. References: cloudnative-pg#10341 Signed-off-by: Abhay Kumar <abhaypro.cloud@gmail.com>

aglees · 2026-03-23T13:57:42Z

👋 I have a feeling that the difference with #10345, is important to efficacy of the solution.

I think the <-ctx.Done() path (line 122), ctx is already cancelled. This means that in this PR context will propagate it through to gRPC calls to the plugin sidecar. Those calls will fail immediately with context canceled.

Due the select block in lifecycle.go it might be that this works when the select happens to pick the SIGTERM case, which is non-deterministic. For our hibernation scenario specifically, it would silently fail to archive WALs roughly half the time, with no indication in the logs beyond a context-cancelled error that could easily be overlooked.

WDYT?

abhayclasher2 · 2026-03-24T03:56:48Z

Hey @aglees — that's a solid catch. I'm at work right now so haven't had a chance to dig in properly, but I'll update this from my main account tonight with a proper fix (using a fresh context instead of the cancelled one). Will push the changes then. Thanks for flagging it!

abhayclasher requested a review from a team as a code owner March 22, 2026 03:56

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Mar 22, 2026

cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.27 release-1.28 labels Mar 22, 2026

dosubot bot added the bug 🐛 Something isn't working label Mar 22, 2026

abhayclasher force-pushed the fix/wal-data-loss-10341 branch from 66a4282 to 0c4f837 Compare March 22, 2026 04:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix WAL data loss on instance hibernation/shutdown (#10341)#10352

Fix WAL data loss on instance hibernation/shutdown (#10341)#10352
abhayclasher wants to merge 1 commit intocloudnative-pg:mainfrom
abhayclasher:fix/wal-data-loss-10341

abhayclasher commented Mar 22, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

aglees commented Mar 23, 2026

Uh oh!

abhayclasher2 commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

abhayclasher commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

aglees commented Mar 23, 2026

Uh oh!

abhayclasher2 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abhayclasher commented Mar 22, 2026 •

edited

Loading

abhayclasher2 commented Mar 24, 2026 •

edited

Loading