Skip to content

Fix several MacOS memory management bugs#31769

Merged
greglucas merged 20 commits into
matplotlib:mainfrom
iccir:main
Jun 6, 2026
Merged

Fix several MacOS memory management bugs#31769
greglucas merged 20 commits into
matplotlib:mainfrom
iccir:main

Conversation

@iccir

@iccir iccir commented May 28, 2026

Copy link
Copy Markdown
Contributor

PR summary

This pull request fixes several memory management issues in _macosx.m. It allows for windows to be properly dealloc'd, which should fix #31106.

I realize that I'm a stranger and first-time contributor, please feel free to push back and be honest! I use to do these types of object ownership audits when I worked at Apple, but it's been almost 15 years since I've looked at manual-reference-counted code :)

Autorelease Pools

One of the root causes of #31106 is that each NSWindow allocated in FigureManager_new() was added to an NSAutoreleasePool that wasn't drained by the time -[NSApplication run] was called. macOS performed several -retain/-autorelease pairs on the windows. By the time our show() function was called, the retain count of each windows was over 600 (See #31751).

Pools at Boundaries

Although messy, a good guideline is to setup an autorelease pool each time you cross from the Python layer into the Objective-C layer. This prevents objects from going into a pool that you have no control over, and may never be drained.

I have added @autoreleasepool blocks to every static function exposed to Python. Some of these aren't needed since there are no Obj-C calls (event_loop_is_running(), Timer_dealloc(), etc). I can removed the unneeded ones; however, I felt like it was better to be consistent and guard against the possibility of a future contributor using Obj-C from event_loop_is_running()/etc and then re-introducing this issue.

I followed the existing coding style and indented each @autoreleasepool block. This resulted in a messy diff due to the number of lines changed. I could instead use preprocessor macros similar to Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS without indentation. Perhaps something like BEGIN_OBJC_CALLS?

Pools in Loops

When a loop can iterate indefinitely, it's a good idea to create a pool for each iteration. I did so in FigureCanvas__start_event_loop() and flushEvents(). There existed a possibility that the call to -sendEvent: could add additional events to the event queue and thus spin the loop indefinitely without pool drainage.

Pools before -run

The structure of show() and wait_for_stdin() is a bit different. Here, we need to use two pools. The first pool will catch the autoreleased value of -[NSApp windows] and will be drained prior to the call to [NSApp run]. Without two pools, the windows will have a +1 retain count and will be waiting to be drained while [NSApp run] is running.

NSTrackingArea Leak

#31754, also fixed by PR #31661 . I can back this out if desired.

The NSTrackingArea created in FigureCanvas_init() was leaking. The +alloc/-init needs a balancing -release.

Technically, NSTrackingArea.owner was an unsafe assign reference until macOS 10.13 when it was turned into a safer weak reference. There now exists the possibility of a crash on macOS 10.12 if the owning NSView is deallocated before the NSTrackingArea. I'm not sure if this is worth addressing, however. I can fix it if desired.

wake_on_fd_write leaks

#31755

There are two leaks in wake_on_fd_write():

  1. The NSFileHandle is never -releaseed.
  2. The notification object returned from -addObserverForName:object:queue:usingBlock: is retained by the system and not released until -removeObserver:.

I fixed both of these by adding cleanup code to the end of the block.

MatplotlibAppDelegate

With pools in place and draining, the MatplotlibAppDelegate object created in lazy_init() was short-lived. As NSApplication.delegate is a weak property, there was no strong reference to it.

Per ARC conventions, I made a new static variable to point to the object. While this isn't necessary under manually reference counting, it's typically used to indicate that an +alloc/-init without a balanced -release is intentional. It also saves a step if a migration to ARC is ever desired.

isReleasedWhenClosed

Setting isReleasedWhenClosed to NO has been the standard for programmatically-created windows since the introduction of ARC in 2011. Parts of AppKit, such as NSWindowController, will automatically call [theWindow setReleasedWhenClosed:NO] on adopted windows.

While the existing code should correctly release the window per historic API documentation, I'm not sure what kind of test coverage isReleasedWhenClosed=YES windows have inside of Apple, as the recommendation is to use NSWindowController.

I changed the window to isReleasedWhenClosed=NO and modified the calls to -close as follows:

    [self->window close];
    [self->window setDelegate:nil];
    [self->window release];

The setDelegate:nil call is needed as NSWindow.delegate didn't become weak until macOS 10.13. I believe that matplotlib officially still supports 10.12.

Testing

Checking for Over-releases

Historically, fixing leaks in one section of code can expose over-releases/crashes in other sections. Prior to this PR, we are "leaking" several Objective-C objects due to them going into a pool that is never drained.

To mitigate this, I ran several built-in examples with permutations of the following environmental variables:

NSZombieEnabled=YES
NSDeallocateZombies=NO
NSAutoreleaseFreedObjectCheckEnabled=YES
MallocHelp=YES
MallocCheckHeapEach=100000
MallocCheckHeapStart=100000
MallocScribble=YES
MallocGuardEdges=YES
MallocCheckHeapAbort=1

I also checked using Instruments with the Zombies template.

That said, the amount of "leaked" objects was large, so this PR may expose existing over-releases as a crash.

Checking for Leaks

I ran the following example and then attached with Instruments with the Allocations template.

import matplotlib.pyplot as plt
import os
import gc

input(f"PID: {os.getpid()}\nPress Enter to continue...")

while True:
    gc.collect()
    fig, ax = plt.subplots()
    ax.bar(['moo', 'foo'], [157, 42])
    plt.show()

I performed several iterations of clicking "Mark Generation" and then closing the window. I proceeded to let the loop run 20+ more times in order to let Python and malloc clear/recycle any memory regions. I found three leaks (the NSTrackingArea in FigureCanvas_init() and the two in wake_on_fd_write()).

AI Disclosure

  • I used AI to search through historic macOS header files and determine when various properties moved from assign to weak. Apple doesn't make this information easily accessible.
  • I also had AI proofread this markdown document.
  • No code was modified by AI, nor did I use any code suggested by AI.

PR checklist

Note

I ran pytest and got:
8858 passed, 1555 skipped, 28 xfailed, 2 xpassed

As a first time contributor, I'm not sure how concerned I should be with "28 xfailed". I don't believe my change could affect those tests.

iccir added 8 commits May 26, 2026 11:42
Before calling [NSApp run], we need to drain the previous pool as the windows have been added to it during enumeration.
There exists a (likely rare) scenario in which new NSEvents can be added to the queue indefinitely, and thus the outermost pool will never be drained. Hence, create a new pool and drain it once per iteration.
Now that we are setting up and draining pools, our MatplotlibAppDelegate is short lived, as NSApplication.delegate is a weak reference.
The alloc/init of the NSFileHandle needs a -release

The object returned by -addObserverForName:object:queue:usingBlock: is retained by the system and needs a corresponding -removeObserver
The call to -[NSApp windows] needs to be inside of an @autoreleasepool block, else there exists the possibility that the windows array will go into a lower pool that won't be drained.
Using setReleasedWhenClosed:NO has been the standard for programmatically-created windows since the introduction of ARC.

Also add calls to setDelegate:nil as NSWindow.delegate was assign until macOS 10.13.
@github-actions

Copy link
Copy Markdown

Thank you for opening your first PR into Matplotlib!

If you have not heard from us in a week or so, please leave a new comment below and that should bring it to our attention. Most of our reviewers are volunteers and sometimes things fall through the cracks. We also ask that you please finish addressing any review comments on this PR and wait for it to be merged (or closed) before opening a new one, as it can be a valuable learning experience to go through the review process.

You can also join us on discourse chat for real-time discussion.

For details on testing, writing docs, and our review process, please see the developer guide.
Please let us know if (and how) you use AI, it will help us give you better feedback on your PR.

We strive to be a welcoming and open project. Please follow our Code of Conduct.

@scottshambaugh scottshambaugh changed the title Closes #31106, #31754, #31755 Fix several MacOS memory management bugs May 28, 2026
@QuLogic QuLogic requested a review from greglucas May 28, 2026 19:06
@iccir

iccir commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

I did some more thinking about the "Pools at Boundaries" case. If we used preprocessor macros at the start and end of the function, it would:

  • Keep code at the same indentation level.
  • Showcase the methodical use of an autorelease pool ("all calls from Python to Obj-C need pools at the boundary") from special cases, which could use @autoreleasepool directly.
  • Allow a @try/@catch to catch any Obj-C exceptions that would get thrown.
  • Possibly perform a main thread check, if the exposed-to-Python methods aren't suppose to be used on non-main threads.

For example, the show() function could become:

#define BEGIN_PYTHON_TO_OBJC_SECTION \
    @autoreleasepool { @try {

#define END_PYTHON_TO_OBJC_SECTION \
    } @catch (NSException *exception) { \
        PyErr_SetString(PyExc_RuntimeError, [[exception reason] UTF8String]); \
    } }

…

static PyObject*
show(PyObject* self)
{
    BEGIN_PYTHON_TO_OBJC_SECTION

    // -objectEnumerator will add all windows to the topmost pool,
    // wrap in @autoreleasepool as -[NSApp run] is long-running.
    @autoreleasepool {
        [NSApp activateIgnoringOtherApps: YES];
        NSArray *windowsArray = [NSApp windows];
        NSEnumerator *enumerator = [windowsArray objectEnumerator];
        NSWindow *window;
        while ((window = [enumerator nextObject])) {
            [window orderFront:nil];
        }
    }

    Py_BEGIN_ALLOW_THREADS
    [NSApp run];
    Py_END_ALLOW_THREADS
    
    END_PYTHON_TO_OBJC_SECTION
    
    Py_RETURN_NONE;
}

@QuLogic

QuLogic commented May 29, 2026

Copy link
Copy Markdown
Member

Keep code at the same indentation level.

I don't know much about the internals or best naming, but the macros do have some value, as otherwise the git blame will be almost completely broken.

@iccir

iccir commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

I don't know much about the internals or best naming, but the macros do have some value, as otherwise the git blame will be almost completely broken.

Yeah, that's one of my big hesitations with my patch as-is, I really dislike the idea of clobbering so much of the blame history!

iccir added 2 commits May 29, 2026 11:05
Rather than using @autoreleasepool and indenting several lines of code (which will break `git blame`), use BEGIN_OBJC_ENTRY and END_OBJC_ENTRY.
@iccir

iccir commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

The test failures above seem to be related to some issue with time zones. While I don't think it's related to my changes, please let me know if I need to rerun the tests somehow or address this on my end.

@greglucas

Copy link
Copy Markdown
Contributor

This is great! Thank you for all the great explanations and walkthrough of the details you've updated, it really helps.

I unfortunately don't have a mac as easily at hand these days, but I'll try and test this out later this weekend if I can. I read through the code and had AI do a quick sanity check as well and am leaving my human written comments inline.

It also saves a step if a migration to ARC is ever desired.

I am interested in moving this project to ARC! I did attempt that a while ago, but we ended up having a segfault when using the subplot tool icon to spawn a new window and then close that, so we reverted that commit. See these references.
#23060
#23653

If you have any ideas of what needs to be done for the ARC work, that would be a huge help. I was just doing that with attempting to read documentation and listening to the compilers, but it didn't seem like it would be a huge lift... I would make sure you test the subplot tool icon to make sure that works as expected still even with these changes. I would test by closing the windows in different orders and closing/reopening the subplots etc. just to make sure it is keeping the count as expected.

Comment thread src/_macosx.m Outdated
Comment thread src/_macosx.m
Comment thread src/_macosx.m
Comment thread src/_macosx.m
Comment thread src/_macosx.m Outdated
Comment thread src/_macosx.m Outdated
@iccir

iccir commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

@greglucas - Note that Objective-C exceptions are regarded to be fragile and are not to be used like exceptions in other programming languages.

From Exceptions and Cocoa:

"The Cocoa frameworks are generally not exception-safe. The general pattern is that exceptions are reserved for programmer error only, and the program catching such an exception should quit soon afterwards."

Apple Objective-C frameworks instead use a (verbose) pattern of:

- (BOOL) trySomethingAndReturnError:(NSError **)error

When Apple made Swift, they automatically translated this pattern to Swift try/catch blocks (See Error Handling).

Thus, it makes sense to have our …_OBJC_ENTRY macros handle Objective-C exceptions, and we should propagate this up to a Python exception. However, from that point onwards, we need to regard the whole UI layer as potentially invalid. It's not suppose to be a recoverable situation (which sounds strange if you are use to exceptions in other languages)!

@greglucas

Copy link
Copy Markdown
Contributor

Thus, it makes sense to have our …_OBJC_ENTRY macros handle Objective-C exceptions, and we should propagate this up to a Python exception. However, from that point onwards, we need to regard the whole UI layer as potentially invalid. It's not suppose to be a recoverable situation (which sounds strange if you are use to exceptions in other languages)!

I agree with this. I think that we need to return NULL for Python to see that there was something wrong in the exception. If we only Py_RETURN_NONE (even when objc caught an exception), the Python side will not necessarily know that an error was set and just continue on thinking things are OK because it got a None response at the interface. Here are some examples of that pattern: https://docs.python.org/3/extending/extending.html

Sorry for the noise, my text editor automatically keeps trying to re-indent these when I save.

@greglucas greglucas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all of those comments. I think this looks good now! We can keep improving in follow-up PRs as I think this lays a nice groundwork.

@iccir

iccir commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

@greglucas - Do I need to do anything else on my end to merge this (or is that something you do)?

In the meantime, I'm attempting to install macOS Sierra 10.12 on a test machine and plan to dig into the Window<->FigureManager reference cycle. I'll likely address that in my ARC work.

Comment thread src/_macosx.m
}
}

static int wait_for_stdin() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already an autoreleasepool in most of this function; do we need a second one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The first pool covers entry, the second pool is due to being inside of an while loop. In general: pools are cheap to push/pop and better than leaking objects.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no while loop in this function, though?

@iccir iccir Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I was using my phone's web browser before and your comment looked like it was related to a different function.

In wait_for_stdin(), we do need multiple autorelease pools, but the code as written is incorrect.

static int wait_for_stdin() {
    BEGIN_OBJC_ENTRY

    …
    if (![[NSApp windows] count]) {
      flushEvents();
      return 1;
    }

At the end of this code snippet, the call to [NSApp windows] has made a copy of the windows array and then added it to the autorelease pool set up in BEGIN_OBJC_ENTRY. This pool isn't drained until the end of the function.

However, we then go on to do a call to [NSApp run], which may run for a very long time. Our windows are retained and in the outermost pool and may not be released for a while.

The if block needs to be in its own @autoreleasepool, and the rest of the function can simply use the pool set up by BEGIN_OBJC_ENTRY.

Good catch!

Comment thread src/_macosx.m Outdated
Comment on lines 583 to 586
NSEvent* event = [NSApp nextEventMatchingMask: NSEventMaskAny
untilDate: date
inMode: NSDefaultRunLoopMode
dequeue: YES];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, these were (perhaps wrongly?) aligned on the colon.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me double check this when I get back to a desktop. Colons should be aligned to each other, but the merge may have messed up alignment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that got misaligned with the merge. I'll fix it. Thanks!

Comment thread src/_macosx.m Outdated
}
@try {

NSApplication* app = [NSApplication sharedApplication];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably dedent these bits at least.

@QuLogic QuLogic left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should squash this, given the indent/dedent back-and-forth.

@greglucas greglucas merged commit 1e6bcaf into matplotlib:main Jun 6, 2026
41 of 42 checks passed
@greglucas

Copy link
Copy Markdown
Contributor

Thanks @iccir! We hope to see you again. I'm definitely interested in reviewing the other updates you mentioned if you're able to get around to them.

@iccir

iccir commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @iccir! We hope to see you again. I'm definitely interested in reviewing the other updates you mentioned if you're able to get around to them.

Thanks for merging! I'll be working on the ARC changes next. Expect a PR once I get some more testing done!

@QuLogic

QuLogic commented Jun 9, 2026

Copy link
Copy Markdown
Member

@meeseeksdev backport to v3.11.x

QuLogic added a commit that referenced this pull request Jun 10, 2026
…769-on-v3.11.x

Backport PR #31769 on branch v3.11.x (Fix several MacOS memory management bugs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

3 participants