Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Jan 25, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Improved SSL/TLS shutdown to fully coordinate close_notify exchange with peers across blocking, non-blocking, and timeout modes, using deadline-aware waits and retries.
    • Consistent mapping of SSL_WANT_* conditions and clearer timeout/error reporting.
    • Socket-mode shutdown reliably returns the underlying socket on success; BIO-mode behavior preserved.
    • Cleanup of thread shutdown registries after fork to avoid blocking on nonexistent threads.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 25, 2026

📝 Walkthrough

Walkthrough

Reworks TLS shutdown: BIO path uses a single SSL_shutdown call; socket path performs a deadline-driven loop that flushes close_notify, waits for peer close_notify using select/timeout handling, maps SSL_WANT_* to SSLWant errors or timeouts, and returns the underlying socket on socket-mode shutdown.

Changes

Cohort / File(s) Summary
OpenSSL shutdown logic
crates/stdlib/src/openssl.rs
Split BIO vs socket modes: BIO uses a single SSL_shutdown and maps errors; socket mode loops SSL_shutdown with deadline/select waits, maps SSL_get_error to SSL_ERROR_WANT_READ/WRITE, handles readiness/timeouts/closed sockets, and may return the underlying socket.
TLS shutdown sequencing
crates/stdlib/src/ssl.rs
Expanded shutdown from one-shot flush to TLS-aware sequence: flush close_notify, set deadline when present, loop waiting for peer close_notify with per-iteration readiness/timeouts, process TLS records, then mark Completed, clear connection, and return socket or timeout.
Thread registry cleanup after fork
crates/vm/src/stdlib/thread.rs
Added cleanup of shutdown_handles in after_fork_child: drop dead entries, mark non-current started threads Done, notify waiters, and remove stale entries to avoid waiting on phantom threads in child process.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant TLS as TLS Layer
    participant Socket as Socket
    participant Selector as Select/Deadline

    App->>TLS: shutdown(timeout)
    TLS->>TLS: flush pending TLS data\nsend close_notify

    alt BIO mode
        TLS->>TLS: SSL_shutdown (single call)
        TLS-->>App: return Ok(None) or error
    else Socket mode (non-blocking)
        TLS->>TLS: mark Completed\nclear connection
        TLS-->>App: return Ok(Some(socket))
    else Socket mode (blocking/timeout)
        loop until peer close_notify or deadline
            TLS->>Selector: wait socket readiness (deadline)
            Selector-->>TLS: ready / timeout
            alt timeout
                TLS->>TLS: mark Completed\nclear connection
                TLS-->>App: return timeout_error or Ok(Some(socket))
            else ready
                TLS->>Socket: read/process TLS data
                Socket-->>TLS: data / EOF
                TLS->>TLS: detect peer close_notify?
                alt peer close_notify seen
                    TLS->>TLS: mark Completed\nclear connection
                    TLS-->>App: return Ok(Some(socket))
                else continue loop
                end
            end
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

Poem

🐰 I sent a gentle whisper, soft and tight,

Timed my hops beneath the moonlit night,
I peeked, I waited, then let go the thread,
Socket freed — the handshake tucked to bed,
Hooray! I hop away, all tidy and light.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix ssl shutdown' directly corresponds to the main changes in the pull request, which focus on reworking SSL_shutdown handling across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review January 25, 2026 13:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/openssl.rs`:
- Around line 2845-2863: The shutdown path incorrectly treats
SelectRet::TimedOut and SelectRet::Closed as success when SSL_shutdown returns 0
(zeros counter path) and breaks out, ultimately returning Ok(Some(socket));
change the logic inside the branch handling socket_stream.select(SslNeeds::Read,
&deadline) so that on SelectRet::TimedOut and SelectRet::Closed you return an
error (or propagate a timeout/closed error) instead of breaking and continuing
to return Ok(Some(socket)), ensuring SSL_shutdown's timeout/closed cases are
surfaced as failures; adjust any callers of the SSL_shutdown loop (references:
SSL_shutdown, zeros counter, socket_stream.select, SelectRet::TimedOut,
SelectRet::Closed, and the code path that currently returns Ok(Some(socket))) to
handle the propagated error.
- Around line 2845-2862: The ret == 0 path currently continues on
SelectRet::Nonblocking which can busy-loop on nonblocking sockets; modify the
SelectRet::Nonblocking arm inside the ret == 0 handling (where
socket_stream.select(SslNeeds::Read, &deadline) is matched) to return an
SSLWantReadError (or the function's equivalent SSL_WANT_READ error) instead of
just continuing, so callers get a proper WouldBlock/WantRead signal; keep the
zeros counter and the other SelectRet arms (TimedOut/Closed/Ok) unchanged.

In `@crates/stdlib/src/ssl.rs`:
- Around line 4124-4130: The code currently swallows errors from
flush_pending_tls_output in the non-blocking shutdown branch (where conn_guard
is dropped) using `let _ =`, which can hide failures to deliver close_notify;
change this to capture the Result from
flush_pending_tls_output(self.flush_pending_tls_output(vm, None)) and if it Err,
either return that Err to the caller or at minimum log it via the existing
logger before proceeding, and only set *self.shutdown_state.lock() =
ShutdownState::Completed and clear *self.connection.lock() = None when the flush
succeeds; reference the flush_pending_tls_output call and the
shutdown_state/connection locks to locate and update this logic.

Comment on lines 4124 to 4133
// Non-blocking: return immediately after sending close_notify.
// Don't wait for peer's close_notify to avoid blocking.
drop(conn_guard);
let _ = self.flush_pending_tls_output(vm, None);
*self.shutdown_state.lock() = ShutdownState::Completed;
*self.connection.lock() = None;
return Ok(self.sock.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent error suppression may hide close_notify delivery failures.

Line 4127 uses let _ = to discard any error from flush_pending_tls_output. In non-blocking mode, if the flush fails (e.g., socket buffer full, connection reset), the close_notify may not reach the peer, and the caller has no indication of this failure.

Consider propagating the error or at minimum logging it, especially since the code immediately marks shutdown as Completed and clears the connection regardless of flush success.

Proposed fix
 Some(0.0) => {
     // Non-blocking: return immediately after sending close_notify.
     // Don't wait for peer's close_notify to avoid blocking.
     drop(conn_guard);
-    let _ = self.flush_pending_tls_output(vm, None);
+    // Best-effort flush; ignore WouldBlock but propagate other errors
+    if let Err(e) = self.flush_pending_tls_output(vm, None) {
+        if !is_blocking_io_error(&e, vm) {
+            // Non-blocking error is expected, but other errors may indicate
+            // close_notify was not sent. Still complete shutdown to avoid
+            // leaving socket in inconsistent state.
+        }
+    }
     *self.shutdown_state.lock() = ShutdownState::Completed;
     *self.connection.lock() = None;
     return Ok(self.sock.clone());
 }
🤖 Prompt for AI Agents
In `@crates/stdlib/src/ssl.rs` around lines 4124 - 4130, The code currently
swallows errors from flush_pending_tls_output in the non-blocking shutdown
branch (where conn_guard is dropped) using `let _ =`, which can hide failures to
deliver close_notify; change this to capture the Result from
flush_pending_tls_output(self.flush_pending_tls_output(vm, None)) and if it Err,
either return that Err to the caller or at minimum log it via the existing
logger before proceeding, and only set *self.shutdown_state.lock() =
ShutdownState::Completed and clear *self.connection.lock() = None when the flush
succeeds; reference the flush_pending_tls_output call and the
shutdown_state/connection locks to locate and update this logic.

@youknowone youknowone force-pushed the ssl-shutdown branch 2 times, most recently from 92ebf63 to 81685b2 Compare January 26, 2026 06:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/ssl.rs`:
- Around line 4169-4191: The shutdown loop currently treats deadline expiry and
self.sock_wait_for_io_with_timeout timing out as a successful shutdown by
breaking the loop; instead, when deadline is reached (the now >= dl branch) or
when timed_out is true, return a timeout error from the enclosing function so
callers see the timeout and can retry the close handshake. Update the code paths
around deadline/remaining_timeout and the result of
self.sock_wait_for_io_with_timeout (the timed_out branch) to return an
appropriate timeout Err variant (consistent with this crate's error type) rather
than breaking and proceeding to clear the connection; apply the same change to
the analogous block at the second location (around lines 4218–4221) so both
timeout cases propagate errors instead of being treated as success.
♻️ Duplicate comments (1)
crates/stdlib/src/ssl.rs (1)

4124-4133: Non‑blocking shutdown completes without peer close_notify.

This path sets ShutdownState::Completed and drops the TLS connection even though peer_closed is still false, so callers can’t retry to observe the peer’s close_notify. That risks reporting a clean shutdown when the peer never replied. Consider keeping the state as SentCloseNotify and returning SSLWantReadError instead of completing.

🛠️ Suggested change
-    *self.shutdown_state.lock() = ShutdownState::Completed;
-    *self.connection.lock() = None;
-    return Ok(self.sock.clone());
+    // Keep TLS state so caller can retry and observe peer close_notify.
+    return Err(create_ssl_want_read_error(vm).upcast());

@youknowone youknowone merged commit edca32a into RustPython:main Jan 26, 2026
23 of 24 checks passed
@youknowone youknowone deleted the ssl-shutdown branch January 26, 2026 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant