Add Surface Laptop 5 lid GPE + s2idle/hibernate fix (v5.4)#2011
Add Surface Laptop 5 lid GPE + s2idle/hibernate fix (v5.4)#2011wowitsjack wants to merge 9 commits into
Conversation
|
Decided to give this patch a try on my Surface Laptop 5 with 6.18.13, but it won't cleanly apply. It's unhappy about the following hunk: @@ -14,6 +14,7 @@ obj-$(CONFIG_SURFACE_AGGREGATOR_TABLET_SWITCH) += surface_aggregator_tabletsw.o
obj-$(CONFIG_SURFACE_DTX) += surface_dtx.o
obj-$(CONFIG_SURFACE_GPE) += surface_gpe.o
obj-$(CONFIG_SURFACE_HOTPLUG) += surface_hotplug.o
+obj-$(CONFIG_SURFACE_S2IDLE_FIX) += surface_s2idle_fix.o
obj-$(CONFIG_SURFACE_PLATFORM_PROFILE) += surface_platform_profile.o
obj-$(CONFIG_SURFACE_PRO3_BUTTON) += surfacepro3_button.oThe header gives |
|
Hm, seems to be a bit wonky for me still, on my first try I somehow got my system into a state where it would suspend and then automatically immediately unsuspend (journalctl output from when that happens). Then the system wouldn't even completely turn off anymore - I tried a reboot and it just hung after systemd's last "Shutting down." message. On the console I still saw output that told me that s2idle_fix recognized me pressing the power button, but I had to long-press it to finally shut down the system. Now that I rebooted I can't reproduce this, however. Like I said this is on 6.18.13, with an i7-1265U. |
|
@wylfen re: Makefile hunk header - Fixed in v2, good catch. The Makefile hunk had a wrong line count in the header. |
|
@wylfen re: suspend loop - Found and fixed in v2. Two things were going wrong:
If you want to re-test, pull the updated branch. The module is also significantly more capable now: EC keepalive prevents death sleep entirely, and lid-open wakes the system within ~5 seconds. |
|
Nice, I'll retest in the coming days and report back. |
|
I am STUCK. D: |
|
Oh, no worries, I appreciate the effort. Could you say which firmware version you updated to? |
SAM firmware: 15.204.139 |
|
Seems I have been on the same one for a while already. What's interesting is that Microsoft lists that particular SAM update for October 2024. There's a more recent one from November 2025, 15.305.139.0, that I don't even have installed yet (but I also basically don't boot Windows on the device anymore) |
4c9694c to
026e497
Compare
|
Hey, sorry for closing this prematurely. I thought a SAM firmware update (15.204.139) had changed the behaviour, but after a fresh OS install and more testing, I slogged away and aligned to the new one v3 is a pretty significant rewrite from v2. The big change: RTC keepalive is gone entirely. Turns out it was never needed. The real problem was in the s2idle wake decision path. When INTC1055 power-gates during s2idle, pin 213 (lid, GPE 0x52) gets its PADCFG0 RXINV bit corrupted. That fires a spurious SCI. In the wake path, The fix is simpler than v2. ACPI wakeup handlers run before
The spurious GPE gets cleaned up before anything else sees it. The actual lid-open decision goes through an LPS0 No more RTC alarm, no more EC keepalive, no more interrupt suppression. Just fix the corruption before the wake path can see it. Still has the safety layers from v2: passive KEY_POWER handler for spurious wake detection, post-resume failsafe with exponential backoff, background RXSTATE polling for when the GPIO edge path breaks after multiple s2idle cycles. Tested on SL5 (i5-1245U), fresh Ubuntu 25.10, kernel 6.18.7-surface-1, same SAM firmware 15.204.139 that I thought broke everything. Hundreds of cycles, no death sleep, no suspend loops. @wylfen would be great if you could test v3 when you get a chance. |
026e497 to
ec1de45
Compare
|
Pushed v4. Two additions on top of v3: SW_LID input events - the module now registers an input device and emits Session locking before suspend - when the module calls Core s2idle fix is unchanged from v3, just these two quality-of-life additions. @wylfen if you're still testing, these shouldn't affect the suspend/resume behaviour at all, just nicer userspace integration. |
|
Oh, nice, I was just to report similar issues for KDE Plasma. I'll retest with those changes. |
Heck yeah, thank you so much! <3 |
|
Alright, so the normal suspend/resume behaviour seems fixed for me now. I've not had suspend loops or death sleep. With the recent changes from v4, userspace behaviour is definitely better, but I still have some feedback there:
Other than that I had to cajole the patch to apply again:
But that's mostly a patch cleanup issue - it would probably be useful to base the patch on some set kernel version and then cherry-pick it onto the other branches to get correct patches. This change as it is with v4 is definitely a step forward in usability for me on my Surface, so thanks a lot! I'll run the current version for at least a week so I can see how it handles when I'm actually actively using the device at work. |
v5: Architectural cleanup per tester feedback. - Removed lock_sessions (loginctl call from kernel): SW_LID events let the DE handle locking natively, no need for kernel-side session lock. - lid_poll_fn no longer calls pm_suspend(): it only emits SW_LID events. The desktop environment decides suspend policy on lid close. This stops the module from overriding DE preferences (e.g. "lock only on lid close"). - Failsafe re-suspend logic is now purely physical: re-suspends when lid is closed and no power button was pressed (spurious wake), stays awake when lid is open. Works regardless of who initiated the suspend. v4: Added SW_LID input events and loginctl lock-sessions. v3: 5-layer architecture with spurious wake, lid resync, background polling. v2: Fixed Kconfig/Makefile patch context. v1: Initial release.
ec1de45 to
aa3e525
Compare
|
Pushed v5, addressing all your feedback: 1) DE preference override - 2) Trigger-happy failsafe - the failsafe now only cares about physical lid state. Lid closed + no power button = spurious wake, re-suspend. Lid open = stay awake, no questions asked. If you hit Sleep from the lock screen with lid open, the system sleeps and wakes normally without the module interfering. 3) lock_sessions removed - agreed, now that SW_LID events work, there's no reason for the kernel to be calling loginctl. DEs handle locking on their own via the lid switch events. Removed The core s2idle fix (PADCFG corruption repair, GPE 0x52 masking, LPS0 hooks, wakeup handler) is unchanged. Just the userspace interaction got cleaned up. Re: patch context issues, I'll take a closer look at getting the Kconfig/Makefile hunks right per kernel version. The current patches are all generated against the same base, which is why the context drifts on some versions. Cherry-picking per branch is the right approach, I just haven't set that up yet. Thanks again for the detailed feedback, especially the KDE perspective. Super valuable having someone test on a different DE. |
|
This is great, thanks. I'll give the new version a look tomorrow. |
|
So I've been looking at the patches and this conversation again and I got to ask: How much am I talking to an LLM and how much of the patches were generated by one? I'm not sure whether this project has any sort of policies for this, but I think it's only respectful to disclose any LLM usage. I think there's a real problem here to be investigated and solved, but I can't personally continue looking at this in good faith until I know what exactly is going on. |
|
@wowitsjack, I'm grateful for your help on this. It looks as though this has been quite frustrating, and I really appreciate your posts and the efforts you've been taking to work through them. I've been suffering from this bug for a while, and am looking forward to the fix. |
I have dyslexia, so all of my comments are passed through CoPilot, so I don't make a fool of myself. Sorry if that's a bit impersonal x__x |
Thank you, really. I super appreciate that! It's been...quite a ride. A long time spent tracing sleep states in Windows and Linux. A LONG time. This has been without a doubt the toughest software/hardware engineering piece I have ever worked on. |
If you'd like to give it a test now, I also maintain a quick easy installer for my own use. - https://github.com/wowitsjack/Surface-Linux-Lid-Fix |
|
Thanks, I tried to install it --- got this error at
I'm trying to install it on top of kernel 6.18.7-surface-1, if that's useful info. |
|
Also found this in the build output, it's probably relevant: |
|
@dmusican That's a Secure Boot thing, not a problem with the module itself. Your kernel is enforcing module signing and the build can't find a signing key, so modprobe rejects it. Easiest fix: disable Secure Boot in your UEFI/BIOS settings. On the Surface, hold Volume Up while powering on to get into the UEFI menu, then turn off Secure Boot under Security. If you want to keep Secure Boot on, you can self-sign the module with a MOK (Machine Owner Key): # Generate a key pair
openssl req -new -x509 -newkey rsa:2048 -keyout MOK.priv -outform DER -out MOK.der -nodes -days 36500 -subj "/CN=My Module Signing Key/"
# Enroll it (will prompt for a password, you'll enter it again on next reboot)
sudo mokutil --import MOK.der
# Reboot, MOK Manager will appear, select "Enroll MOK" and enter your password
# Then sign the module
sudo /usr/src/linux-headers-$(uname -r)/scripts/sign-file sha256 MOK.priv MOK.der /lib/modules/$(uname -r)/updates/surface_s2idle_fix.ko
# Now modprobe should work
sudo modprobe surface_s2idle_fixDisabling Secure Boot is way less hassle though, and fine for a personal laptop. |
|
Thanks --- this isn't my machine, it's one for work where I'm already on thin ice for having dual-booted it to Linux... it also boots to Windows (which I use infrequently), and so I don't know what hassles this is going to cause there or for the admin software they're running on that end. I REALLY appreciate that you're fixing this! And yeah, I could go for the self-signed version (thanks for the instructions on this), but I think I'm going to have to defer on testing and wait for the release. |
|
Hey all, just wanted to address this properly. @wylfen I completely understand your concern, and I'm sorry if the way I've been communicating has made this feel off. I mentioned earlier that I have dyslexia and lean on CoPilot to clean up my writing so it's actually readable, but I totally get that it can come across as impersonal or raise questions. That's on me. I don't want my involvement to be a blocker or a source of friction for anyone trying to get this bug fixed. The last thing I want is for trust issues around how I write comments to get in the way of people who've been dealing with this since 2023. So I think the right call is for me to step back and close this out. @dmusican I'm really sorry it's ending up this way. I know you've been waiting on a fix for a while and I wish I could've gotten this over the line for you. Thanks for testing and for the kind words, @dmusican . No hard feelings. Just a small note: if I do end up pulling this contribution, please don't reuse my code or patches without my explicit permission. I'd like to retain control over what I've written. |
I'm sympathetic to the decision to use LLMs to facilitate communication if you think it's otherwise impossible. I don't mean to shame you for such a decision. Due to the nature of the technology, however, I think it's important to disclose its usage: only once LLM use is documented properly can people take a look at the work in a meaningful way. This is both about the nature of the code produced and about respectful communication. I'm still convinced that problems with sleep are real issues to be documented and fixed and that this patch does prod at the right bits. It's just difficult for me engage with the code directly because I am very inexperienced in hardware engineering and because of the code's complexity and size. So the only thing I can fall back on is trusting your output, which became hard for me to do. It would need someone from this project with hardware experience to properly review this. I do also appreciate the effort, it's unfortunate to have this end on a rather disappointing note. |
Wolfgang, Which is it, mate? You're losing track of your own argument. So is it you appreciate my work, or you "aren't sure [I'm] real" and I'm an LLM? I want to explain how this has actually landed. "If you think it's otherwise impossible." I have dyslexia. I use assistive tools so I can write clearly. That's how I participate in projects like this. When you phrase it as "if you think," you're leaving room to question whether my need is real. I don't believe that was your aim, but I need you to understand how that reads from my side. Asking me to formally disclose and document my use of accessibility tools before my work can be "meaningfully evaluated," and framing that as "respectful communication," I understand the broader concern about LLMs in open source. But applied to someone using assistive technology for a disability, that amounts to asking me to label myself before I'm taken seriously. Would you ask someone using a screen reader to disclose that before engaging with their review? You've mentioned you're "very inexperienced in hardware engineering" and found the code too complex to engage with directly. I respect that honesty. But it means the right move was to flag this for someone with hardware experience. Instead, what happened was a public accusation about my writing style, and I stepped back because I didn't want my involvement to become a blocker for people who need this fix. Since closing this PR, I've had people reaching out in DMs the project's Matrix chat asking if they can still get a copy. I have been updating the community of the unfortunate turn of events your attitude and approach has cascaded into. @dmusican, I'm sorry this has left you hanging. If you have a Discord or wherever works for you, I'll help you get the fix running on your machine directly. You shouldn't have to keep waiting because of this. I'll also be emailing qzed to make it explicitly clear that my code and patches are not to be used or derived from in any capacity by this project. If there are questions about why, they can ask you, Mr. Müller. It's a little off to me, someone who couldn't evaluate the code technically anyway trying to police the disability tools of others. What would formal disclosure have changed for a reviewer who self-admittedly lacks the hardware expertise to evaluate the substance? I hear you that this is disappointing. It is. For everyone. |
|
Hey all, quick update. @qzed reached out to me directly after receiving my email, and we had a good conversation. His position is clear: LLMs are tools, and it's the end product that counts. That's all I needed to hear. I'm reopening this and making the patches fully available under GPL to the linux-surface project and anyone else who wants to use them. No restrictions. The fix works. People need it. That's what matters. @dmusican, the offer still stands if you want help getting it running on your machine in the meantime. |
|
Honestly, I'm not sure what the problem with using or not using LLMs is. LLMs are, in my opinion, tools. In the end, it's the end product that needs to be evaluated, doesn't matter if an LLM was used or not. At this point, I also guess that a lot of people already use some kind of LLM (e.g, copilot) while coding, and I don't necessarily expect everyone to straight up disclose that all the time. Again, the end product matters. Now of course it's your own choice of who (and what tools) to trust or not. I personally always found it a bit hard to just trust anyone on the internet that gives you something to run (yes, also before LLMs). But fwiw, I think at this point, any kind of exploit still very likely needs a human hand. I'll try to review the code when I have some time. Please be civil and respectful. |
|
@wowitsjack, thanks for the offer! I'm going to give this a little time to see if the commit just makes it in, then I don't have to futz with signing it, etc. If I get inspired to try it before that, I will. I'm optimistic that hopefully this just works out, but I'll speak up if I can use some help. Again, thanks for all you've done. |
Replace stripped s2idle-only patches (924 lines) with full v5.2a source (1712 lines) across all 11 kernel versions. v5.2a adds hibernate support with RTC-based time synchronization, post-resume wifi bounce recovery, display reconnect fix, hardened SNTP-free timekeeping for long hibernation, and freeze/thaw PM ops.
|
v5.2a update Patches updated from the stripped s2idle-only source (924 lines) to the full v5.2a module (1712 lines). Core s2idle fix is unchanged, same layered PADCFG repair + GPE masking + LPS0 hooks. The big changes:
All 11 kernel versions (6.6, 6.9-6.18) updated. Tested on SL5 (i5-1245U), Ubuntu 25.10, kernel 6.18.7-surface-1. s2idle and hibernate cycles, including 11h+ hibernation. No death sleep, no PADCFG corruption, correct wall clock on resume. |
v5.3 fixes: - GPIORXDIS toggle after PADCFG restore for pin input re-latch - Display/backlight recovery on s2idle resume with PADCFG corruption - Accurate sleep duration tracking via ktime_get_boottime() - Version string consistency (v5.1a -> v5.3)
|
v5.3 update pushed Hit a confirmed death sleep on lid close while running v5.1a (hadn't rebuilt from v5.2a source, my bad). Dug through the previous boot logs and found the kill chain: After ~51 minutes of s2idle, VNN power-gating corrupted PADCFG0 as expected. The module corrected the register values, but the pin's input buffer was still latched to the corrupted state. RXSTATE read 0 (lid open) even though the lid was physically closed. The module thought it was a genuine wake, didn't re-suspend, but display recovery only ran on the hibernate path, not s2idle. Black screen, system running but no way to interact, eventually re-suspended into a state it couldn't come back from. Also found that v5.3 fixes:
All patches updated across 6.6, 6.9-6.18. |
Update surface_s2idle_fix patches to v5.3b across all kernel versions. New features: - suspend_to_lock: intercept GUI/manual suspend, convert to screen lock + display blank + network disable (lid-close suspend unaffected) - Wake input handler restores display and networking on keypress - SAM wakeup setup for power button wake from s2idle - Power button detection in lps0_check Fixes: - Init-time PADCFG0 restoration now checks GPIORXDIS (bit 8) in addition to RXINV, fixing broken lid detection after module reload following VNN corruption
v5.3c changes: - Fix failsafe death sleep: power button path checks lid_close_reported before starting resync, preventing infinite re-suspend when lid opened during sleep - Add lock_sessions() before all failsafe re-suspend paths so screen is locked after rapid-wake VNN cycles - Add lid_close_reported safety checks in lid_resync_fn to abort if poller detected lid open during backoff - Version bump to 5.3c
GPE refcount fix, full PADCFG0 corruption detection, lock screen flash fix, RXSTATE settling guard, GPE masked during s2idle.
|
@wowitsjack Thanks for all your work! Would you mind opening a PR against the kernel repo? I think it's just a bit easier to review and I have the patch-from-kernel-repo process quite automated now. |
Done! :D |
Fix display wake on lid open: call ULID directly to set ACPI button driver state, enabling real SW_LID 1->0 transition on event0. GPI_GPE_EN re-enable and RXINV write ensure _L52 fires on wake.
Closes #1782
Two patches, both across all kernel versions (6.6, 6.9-6.18).
1. Add SL5 to surface_gpe DMI table
Surface Laptop 5 uses GPE 0x52 for lid state notification, same as the Surface Pro 9, but was missing from the DMI match table.
2. surface_s2idle_fix module (v5.4)
Intel INTC1055 pinctrl power-gating during s2idle corrupts pin 213's PADCFG0 RXINV bit. The flipped bit fires a spurious SCI on GPE 0x52. In the kernel's s2idle wake path,
acpi_ec_dispatch_gpe()callsacpi_any_gpe_status_set(), which promotes any non-EC GPE with status set to a full resume. The corrupted pin causes level-like re-assertion, so the status bit is always set when checked. The system either never wakes (wakeup framework poisoned bypm_system_cancel_wakeup()) or wakes spuriously on every s2idle cycle.Fix: The ACPI wakeup handler runs before
acpi_ec_dispatch_gpe()in the s2idle wake decision path (acpi_check_wakeup_handlers()at sleep.c:777, beforeacpi_ec_dispatch_gpe()at sleep.c:786). The handler:acpi_clear_gpe()(soacpi_any_gpe_status_set()doesn't see it)The actual lid-open decision is deferred to an LPS0
check()callback which reads RXSTATE directly and only callspm_system_wakeup()if the lid is genuinely open.Result: spurious GPEs stay invisible inside the s2idle loop (sub-millisecond, no user-visible wake). Genuine lid-open wakes the system cleanly.
Additional layers:
resume_earlyPADCFG repair as safety netv5.4 changes
acpi_enable_gpeinstead ofacpi_set_gpeto maintain proper ACPI reference counting. Fixes second lid close not being detected after first suspend cycle.resume_early, causing false lid events to logind (which re-suspends via HandleLidSwitch). Moved unmask to after system is fully stable.lock_sessions()now blocks withUMH_WAIT_PROC, plus 500ms delay beforepm_suspendin failsafe to let GNOME render lock screen into frame buffer before freeze.fix_padcfg_corruptionwithsave_all_pinsto capture ACPI's legitimate RXINV changes from _L52, preventing false corruption detection.last_poll_rxstatefromlid_was_closed_at_suspendto prevent false open transitions during RXSTATE settling.v5.3c changes
lid_close_reportedbefore starting resync, preventing infinite re-suspend loop when lid was opened during sleep.lock_sessions()helper called before all failsafe re-suspend paths, so screen is locked after rapid-wake VNN cycles.lid_resync_fnchecks poller state before and after backoff, aborting if lid was opened during the wait.v5.3b changes
suspend_to_lock=1(default), GUI/CLI/keyboard-triggered suspend (lid open) is intercepted at the PM notifier level and converted to: lock screen, disable networking, blank display after 2s. A passive input handler restores display + networking on any keypress. Lid-close suspend passes through to real s2idle with full module protection.v5.3 changes
ktime_get_boottime()Tested on Surface Laptop 5 (i5-1245U), fresh Ubuntu 25.10, kernel 6.18.7-surface-1. s2idle cycles, hibernate cycles, extended sleep (11h+), repeated lid open/close, suspend-to-lock with lid open, rapid-wake VNN cycles with failsafe re-suspend + lock verification.