Fix WAL data loss on instance hibernation/shutdown (#10341)#10352
Fix WAL data loss on instance hibernation/shutdown (#10341)#10352abhayclasher wants to merge 1 commit intocloudnative-pg:mainfrom
Conversation
|
❗ By default, the pull request is configured to backport to all release branches.
|
When a CNPG instance is shut down (e.g. during hibernation), the last WAL segments might remain on local storage and not be archived to the object store. This is because ArchiveAllReadyWALs() was only triggered during switchover/demotion. This commit ensures that ArchiveAllReadyWALs() is also called after a successful smart/fast shutdown in the lifecycle loop, preventing data loss of the final WAL segments. References: cloudnative-pg#10341 Signed-off-by: Abhay Kumar <abhaypro.cloud@gmail.com>
66a4282 to
0c4f837
Compare
|
👋 I have a feeling that the difference with #10345, is important to efficacy of the solution. I think the Due the WDYT? |
|
Hey @aglees — that's a solid catch. I'm at work right now so haven't had a chance to dig in properly, but I'll update this from my main account tonight with a proper fix (using a fresh context instead of the cancelled one). Will push the changes then. Thanks for flagging it! |
While looking at the WAL archiving logic, I noticed that when a CNPG instance shuts down normally, any final WAL segments sitting in .ready state never get pushed to S3. The archiver only runs during switchover/demotion, not during regular shutdown. This is a problem especially during cluster hibernation — you lose the last batch of transactions.
The fix is straightforward: call ArchiveAllReadyWALs() after the PostgreSQL shutdown completes, same way we do for switchover. Ensures those final segments don't sit orphaned on local storage.
Fixes #10341