Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Jan 27, 2026

Summary by CodeRabbit

  • New Features

    • Much broader Windows async I/O surface: more socket, file, pipe, event and completion-queue operations exposed.
    • Expanded read/write patterns with richer IPv4/IPv6 address parsing/formatting and buffers that persist across async calls.
    • New callback/queued-completion workflow and lifecycle management for queued async callbacks.
  • Bug Fixes

    • Centralized and improved Windows error mapping and propagation.
    • Fixed file encoding when generating opcode metadata for consistent cross‑environment behavior.
  • Chores

    • Minor spelling/dictionary updates.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds comprehensive Windows async I/O support: loads Winsock extension functions, expands Overlapped variants and helpers, implements numerous WinAPI/WSA wrappers and IOCP/wait callback plumbing, improves IPv4/IPv6 address parse/format, and fixes UTF‑8 usage in an opcode metadata script.

Changes

Cohort / File(s) Summary
Windows Async I/O Expansion
crates/stdlib/src/overlapped.rs
Add static Winsock extension pointers and initialize_winsock_extensions; implement parse_address / unparse_address (IPv4/IPv6); extend OverlappedData with ReadInto/Write/Accept/Connect/ReadFrom/ReadFromInto/WriteTo and new structs; add many new methods (ReadFile*/ReadFileInto, WSARecv*/WSARecvInto/WSARecvFrom*/WSASend*/WSASendTo, AcceptEx, ConnectEx, DisconnectEx, TransmitFile, ConnectNamedPipe, ConnectPipe, BindLocal, CreateIoCompletionPort, GetQueuedCompletionStatus, PostQueuedCompletionStatus, RegisterWaitWithQueue, UnregisterWait/UnregisterWaitEx, CreateEvent/SetEvent/ResetEvent, WSAConnect, FormatMessage, etc.); change handle signatures to isize; introduce PostCallbackData and a wait/callback registry; centralize Windows error mapping.
Build Script Encoding Fix
scripts/generate_opcode_metadata.py
Use read_text(encoding="utf-8") and write_text(..., encoding="utf-8") to ensure deterministic UTF‑8 I/O.
Spell list update
.cspell.dict/cpython.txt
Minor wordlist adjustments (insert pymain, reorder entries).

Sequence Diagram(s)

sequenceDiagram
    participant Py as Python caller
    participant OV as Overlapped object
    participant Winsock as Winsock / Kernel API
    participant IOCP as IO Completion Port

    Py->>OV: create Overlapped and invoke operation (e.g. AcceptEx / ReadFile / ConnectEx)
    OV->>Winsock: parse_address / prepare buffers / call WinAPI
    Winsock-->>OV: immediate result or ERROR_IO_PENDING
    OV->>IOCP: associate or submit overlapped (CreateIoCompletionPort / PostQueuedCompletionStatus / RegisterWaitWithQueue)
    IOCP-->>OV: completion notification (GetQueuedCompletionStatus)
    OV->>Py: getresult() -> return (buffer, addr, bytes) or raise error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • impl more overlapped #6885 — Direct overlap: modifies crates/stdlib/src/overlapped.rs, adds Winsock extension loading and many of the same Overlapped/WSA APIs.
  • impl more winapi #6529 — Adds/changes Windows I/O primitives (CreateEvent/SetEvent, WriteFile, named pipe helpers) that overlap with the new interop APIs here.

Suggested reviewers

  • ShaharNaveh

Poem

🐰
I dug a tunnel, found a queue,
Winsock hummed, "async work for you!"
Overlapped paws nudge and wait,
Ports and events orchestrate,
The rabbit hops — I/O runs true. 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'impl more overlapped' is too vague and generic. It uses non-descriptive terminology that doesn't convey the specific purpose or scope of the substantial changes being made. Provide a more descriptive title that clarifies the main objective, such as 'Add Winsock extensions and expanded overlapped I/O operations' or 'Implement Windows IOCP and async overlapped operations'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the overlapped branch 2 times, most recently from 29bed98 to a6457db Compare January 27, 2026 05:46
@youknowone youknowone marked this pull request as ready for review January 27, 2026 05:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/overlapped.rs`:
- Around line 835-841: The WSABUF creation for WSASend currently passes a null
pointer for non-contiguous buffers (same bug as WSARecvInto); update the code
that constructs the WSABUF in the WSASend path to use the contiguous_or_collect
pattern (like in WSARecvInto) instead of directly using
buf.as_contiguous().map(...).unwrap_or(null_mut()); ensure you allocate/collect
into a contiguous temporary when needed and use that temporary's pointer for
buf, keeping len = buf_len as u32; reference WSABUF, WSASend, and the
contiguous_or_collect helper/logic to locate where to apply the change.
- Around line 1390-1396: The WSARecvFromInto call builds a WSABUF from
buf.as_contiguous() which results in a null pointer for non-contiguous buffers;
apply the same contiguous_or_collect pattern used elsewhere: detect
non-contiguous buffers, collect them into a temporary contiguous Vec (or slice)
and use its pointer for WSABUF.buf, otherwise use buf.as_contiguous().as_ptr();
ensure the temporary collection outlives the WSARecvFromInto call (store it in a
local variable scoped for the syscall) and update the WSABUF.len accordingly so
WSARecvFromInto always receives a valid contiguous pointer.
- Around line 1227-1233: The WSABUF construction for WSASendTo currently uses
buf.as_contiguous().map(...).unwrap_or(null_mut()) which yields a null pointer
for non-contiguous buffers; update the WSASendTo call to follow the
contiguous_or_collect pattern used elsewhere: detect if buf.as_contiguous()
returns Some and use that pointer, otherwise collect into a contiguous Vec
(e.g., via contiguous_or_collect helper), keep that Vec alive in scope, and pass
its pointer into the WSABUF.len/buf fields so WSASendTo never receives a null
pointer; reference WSABUF, WSASendTo and the contiguous_or_collect pattern when
making the change.
- Around line 726-732: The WSABUF creation currently passes a null pointer when
buf.as_contiguous() returns None; instead ensure the buffer is made contiguous
(like ReadFileInto/WriteFile do) before building WSABUF: use the
contiguous_or_collect pattern on the incoming buffer (e.g., call
buf.contiguous_or_collect into a temporary Vec<u8> when as_contiguous() is
None), then set WSABUF.buf to the temporary Vec's pointer and WSABUF.len to the
proper length so WSARecv never receives a null data pointer; update the
surrounding function (the WSARecv call site in overlapped.rs where WSABUF is
constructed) to manage the temporary storage's lifetime until the
async/overlapped operation is initiated.
🧹 Nitpick comments (4)
crates/stdlib/src/overlapped.rs (4)

82-84: Minor: Race condition window in initialization check.

The check ACCEPT_EX.get().is_some() provides early return, but there's a small window where two threads could both pass this check before either completes initialization. While OnceLock::set handles this safely (only first setter wins), both threads would create and close a socket. This is harmless but slightly wasteful.

Consider using OnceLock::get_or_try_init for a more idiomatic pattern, though current implementation is functionally correct.


1171-1197: ConnectPipe should be a module-level function, not a method.

ConnectPipe doesn't use zelf and is essentially a static function. In CPython's _overlapped module, this is a module-level function, not a method on Overlapped. Consider moving it to be a #[pyfunction] for API consistency.

♻️ Suggested refactor

Move ConnectPipe outside the impl Overlapped block and change the attribute:

-        // ConnectPipe - this is a static method that returns a handle
-        #[pymethod]
-        fn ConnectPipe(address: String, vm: &VirtualMachine) -> PyResult<isize> {
+    #[pyfunction]
+    fn ConnectPipe(address: String, vm: &VirtualMachine) -> PyResult<isize> {

1692-1705: Incorrect MAKELANGID calculation.

The language ID calculation on line 1705 uses (SUBLANG_DEFAULT << 10) | LANG_NEUTRAL, but the Windows MAKELANGID macro is defined as (sublang << 10) | lang. With SUBLANG_DEFAULT = 1 and LANG_NEUTRAL = 0, this produces 0x400, which is correct.

However, a clearer approach would be to use 0 (system default) or define a MAKELANGID helper for maintainability.


359-369: IPv6 address formatting produces non-canonical representation.

The IPv6 address formatting on lines 359-369 produces uncompressed addresses (e.g., 0:0:0:0:0:0:0:1 instead of the canonical compressed form ::1). While technically valid per RFC 4291, this differs from Python's standard socket.inet_ntop() behavior.

Consider applying IPv6 address compression to maintain consistency with CPython's socket module, especially if Python code performs string-based address comparisons.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/overlapped.rs`:
- Around line 44-126: The early-return in initialize_winsock_extensions only
checks ACCEPT_EX and can return when others are unset, causing later .unwrap()
panics; change the check to verify all four OnceLock variables (ACCEPT_EX,
CONNECT_EX, DISCONNECT_EX, TRANSMIT_FILE) are set before returning (e.g.,
replace the single ACCEPT_EX.get().is_some() check with a combined check that
all four .get().is_some() are true), so the function only returns early when
every extension pointer is initialized.
- Around line 1364-1447: In WSARecvFromInto validate the user-provided size
against the actual contiguous buffer length before creating WSABUF: inside the
WSARecvFromInto function (where contiguous is obtained and before wsabuf is
constructed), convert contiguous.len() to u32 safely and either reject sizes
greater than the buffer (return a buffer/value error) or clamp the length (e.g.,
use min(size, contiguous_len_u32)); then use that validated length for
WSABUF.len to prevent overruns. Ensure the error path resets inner.data
consistently (same place where other failures set OverlappedData::NotStarted)
and keep references to OverlappedReadFromInto and inner.overlapped unchanged.
- Around line 1566-1651: The RegisterWaitWithQueue path currently leaks
PostCallbackData if UnregisterWait*/UnregisterWaitEx runs before the callback;
fix by tracking callback allocations and synchronizing free: create a global
concurrent registry (e.g., DashMap or Mutex<HashMap>) keyed by the wait HANDLE
(new_wait_object) that stores the Box pointer (data_ptr) wrapped with an atomic
guard (e.g., Arc<AtomicBool> or AtomicUsize) to detect/avoid double-free, insert
the entry before/after successful RegisterWaitForSingleObject in
RegisterWaitWithQueue, and on failure free as before; in post_to_queue_callback,
check and atomically set the guard to claim ownership, remove the entry from the
registry if present, then call Box::from_raw to free only if claimed; in
UnregisterWait and UnregisterWaitEx, look up and remove the registry entry and
atomically claim/free the Box if it wasn't already freed by the callback (and
only call UnregisterWait*/UnregisterWaitEx afterwards or handle return/error
transparently).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/overlapped.rs`:
- Around line 82-84: The early-return only checks ACCEPT_EX so if initialization
fails partway the other three locks remain unset and later .unwrap()s panic;
change the init to be atomic: either check that all four statics are already
Some before returning (ACCEPT_EX, CONNECT_EX, DISCONNECT_EX, SEND_EX) or,
better, perform initialization into local temporaries and only assign them to
the four static cells under a single Once/AtomicBool/OnceCell guard so partial
initialization cannot be observed; ensure you do not early-return after setting
just ACCEPT_EX and that any failure leaves the statics unchanged so subsequent
calls will retry the full init.

@github-actions
Copy link
Contributor

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin overlapped

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/overlapped.rs`:
- Around line 1577-1732: The callback can dereference freed PostCallbackData
(use-after-free) because cleanup_wait_callback_data frees the Box while
post_to_queue_callback may still read fields; switch to reference counting so
the allocation lives while callbacks run: change PostCallbackData instances to
be stored in Arc<PostCallbackData> (remove manual Box::from_raw frees and the
freed AtomicBool), update WAIT_CALLBACK_REGISTRY to hold Arc<PostCallbackData>
(not CallbackDataPtr) and make RegisterWaitWithQueue insert an Arc clone into
the registry and pass a raw Arc pointer or weak reference to the native
callback, and update post_to_queue_callback to obtain an Arc clone (ensuring the
struct stays alive while dereferencing completion_port/overlapped) and drop it
when done; also adjust
cleanup_wait_callback_data/UnregisterWait/UnregisterWaitEx to remove the Arc
from the registry without freeing the underlying data (letting Arc drop when all
callbacks finish).
🧹 Nitpick comments (1)
crates/stdlib/src/overlapped.rs (1)

821-879: WSASend/WSASendTo reject non-contiguous buffers, unlike ReadFileInto/WriteFile in the same module.

The implementation uses buf.as_contiguous() which rejects non-contiguous buffers with a buffer error. However, other overlapped I/O methods in this file (ReadFileInto, WriteFile) use buf.contiguous_or_collect() to automatically handle non-contiguous buffers by collecting them into a contiguous copy. For consistency within this module's API, consider using the same contiguous_or_collect pattern for WSASend (and WSASendTo at line 1224) to accept non-contiguous buffers transparently.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/overlapped.rs`:
- Around line 610-618: The ReadFileInto call uses buf.contiguous_or_collect
which can create a temporary buffer that will be dropped while an
overlapped/asynchronous ReadFile is in-flight, causing data to be written into
freed memory; update the ReadFileInto path (the code that calls
buf.contiguous_or_collect and invokes ReadFile with inner.overlapped) to
explicitly reject non-contiguous buffers instead of collecting into a temporary
(mirror the approach used in WSARecvInto), returning an appropriate error when
buf.is_contiguous() is false, so the user's buffer remains valid for the
lifetime of the overlapped operation.
🧹 Nitpick comments (1)
crates/stdlib/src/overlapped.rs (1)

361-378: Consider using canonical IPv6 formatting for consistency with Python.

The current IPv6 address formatting outputs the full expanded form (e.g., 0:0:0:0:0:0:0:1) whereas Python's socket module typically uses the compressed canonical form (e.g., ::1). This may cause string comparison issues if Python code expects the canonical format.

This is a minor compatibility concern that may or may not matter depending on usage patterns.

Add comprehensive Windows async I/O support to the overlapped module:

Winsock Extensions:
- Initialize AcceptEx, ConnectEx, DisconnectEx, TransmitFile via WSAIoctl
- Proper cleanup with all four extension locks checked

Overlapped Methods:
- ReadFile, ReadFileInto: Async file reading
- WriteFile: Async file writing
- WSARecv, WSARecvInto: Async socket receive
- WSASend: Async socket send
- WSARecvFrom, WSARecvFromInto: Async UDP receive with address
- WSASendTo: Async UDP send to address
- AcceptEx: Async socket accept
- ConnectEx: Async socket connect
- DisconnectEx: Async socket disconnect
- TransmitFile: Async file transmission over socket
- ConnectNamedPipe, ConnectPipe: Named pipe support

Module Functions:
- CreateIoCompletionPort: IOCP creation/association
- GetQueuedCompletionStatus: Dequeue completion packets
- PostQueuedCompletionStatus: Post completion packets
- RegisterWaitWithQueue: Register wait with completion port callback
- UnregisterWait, UnregisterWaitEx: Unregister waits with proper cleanup
- BindLocal: Bind socket to local address
- CreateEvent, SetEvent, ResetEvent: Event object management
- FormatMessage: Windows error message formatting

Safety & Correctness:
- Validate buffer contiguity before async operations
- Validate size parameters to prevent buffer overflow
- Use Arc for RegisterWaitWithQueue callback data to prevent UAF
  (callback may run after UnregisterWait returns per Windows API docs)
@youknowone youknowone merged commit e2fda95 into RustPython:main Jan 28, 2026
10 of 13 checks passed
@youknowone youknowone deleted the overlapped branch January 28, 2026 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant