Skip to content

Conversation

@jxdabc
Copy link
Contributor

@jxdabc jxdabc commented Jan 15, 2022

Repo is the standard tool that Google uses to build Android, Chrome OS, Chromium, etc, written in Python.

Many Repo users have encountered resource leak warnings with Python 3.9.
https://bugs.chromium.org/p/gerrit/issues/detail?id=14934&q=component%3Arepo&can=5

I did some work and found that the problem is not caused by the code of Repo, but a bug of the Python library, multiprocess.

To make it simple, the Python script below leaks named resource even when exiting normally. (And be unlinked by resource_tracker with a warning. )

import multiprocessing as mp

global_resource = mp.Semaphore()

def submain(): pass

if __name__ == '__main__':
    p = mp.Process(target=submain)
    p.start()
    p.join()

Tested on macOS with Python 3.9.7

python test.py
resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown.

This bug will 100% reproduce when then main module uses named resources as global variables and uses spawn context, which is the case of Repo on macOS.

This is caused by multiprocess::BaseProcess::_bootstrap.

When a new process is started with multiprocessing.Process.start() in spawn context.

  1. The main module is reloaded in the subprocess (for pickle) in multiprocessing::spawn::_main.
  2. Named resources (such as the semaphore above) in the main module resister their _cleanup into multiprocessing::util::_finalizer_registry, which unlink themselves.
  3. multiprocess::BaseProcess::_bootstrap then clears _finalizer_registry.

When a subprocess is spawned, it is no need to clear util::_finalizer_registry (and no need to call util::_run_after_forkers). Disable clearing _finalizer_registry (and disable call to _run_after_forkers) should fix this bug without breaking anything else.

And I uploaded a MR.

https://bugs.python.org/issue46391

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@jxdabc

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@jxdabc
Copy link
Contributor Author

jxdabc commented Jan 26, 2022

cc @pitrou

@jxdabc
Copy link
Contributor Author

jxdabc commented Feb 1, 2022

@1st1

@arhadthedev
Copy link
Member

arhadthedev commented Jun 2, 2022

Could you merge a fresh main into your branch? The tests are stuck so nobody can even see if anything is broken.

@jxdabc jxdabc force-pushed the fix-issue-46391 branch from dd3fca1 to d8161cb Compare June 5, 2022 22:49
@jxdabc
Copy link
Contributor Author

jxdabc commented Jun 6, 2022

Could you merge a fresh main into your branch? The tests are stuck so nobody can even see if anything is broken.

OK. I've rebased the patch to the main.

@jxdabc
Copy link
Contributor Author

jxdabc commented Jun 6, 2022

Could you merge a fresh main into your branch? The tests are stuck so nobody can even see if anything is broken.

@arhadthedev Hope for review and fix. The problem caused by this bug is really annoying. A warning appears on most of the repo commands I use. #90549

@arhadthedev
Copy link
Member

cc @pitrou and @1st1 once again

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for diagnosing this and finding a fix! Here are a couple comments.

@pitrou pitrou added topic-multiprocessing 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Jun 6, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @pitrou for commit 11b3674851836dfc0cd1a1817e26b625a95bb32b 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 6, 2022
@jxdabc jxdabc force-pushed the fix-issue-46391 branch from 11b3674 to e9acebd Compare June 7, 2022 03:58
@jxdabc jxdabc requested a review from pitrou June 7, 2022 04:26
@jxdabc
Copy link
Contributor Author

jxdabc commented Jun 9, 2022

@pitrou Anything else I could do to get this fixed?

LGTM.

@pitrou: I let you merge it ;-)

@jxdabc jxdabc force-pushed the fix-issue-46391 branch from e6f8e28 to 7cc22ae Compare June 9, 2022 11:58
@pitrou pitrou added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 9, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @pitrou for commit 7cc22ae 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 9, 2022
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks a lot @jxdabc .

@pitrou
Copy link
Member

pitrou commented Jun 9, 2022

Let's wait a bit for the buildbot results now.

@pitrou pitrou changed the title gh-90549: Library multiprocess leaks named resources. gh-90549: Fix leak of global named resources using multiprocessing spawn Jun 9, 2022
@pitrou
Copy link
Member

pitrou commented Jun 9, 2022

We might want to backport this to 3.10, given that this fixes a leak. multiprocessing is however a complex library and there is a non-zero risk that this would cause unforeseen regressions.

@jxdabc
Copy link
Contributor Author

jxdabc commented Jun 9, 2022

We might want to backport this to 3.10, given that this fixes a leak. multiprocessing is however a complex library and there is a non-zero risk that this would cause unforeseen regressions.

@pitrou what I need to do about this backport?

@pitrou
Copy link
Member

pitrou commented Jun 9, 2022

@jxdabc Nothing special, it's a decision core developers make.

@vstinner @pablogsal What do you think?

@vstinner
Copy link
Member

vstinner commented Jun 9, 2022

It's a bugfix, it should be backported.

there is a non-zero risk that this would cause unforeseen regressions.

That's right, but if it introduces a regression, it can be fixed as well.

@jxdabc
Copy link
Contributor Author

jxdabc commented Jun 9, 2022

Let's wait a bit for the buildbot results now.

@pitrou Glad to see all check passed.

@pitrou pitrou added needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels Jun 9, 2022
@pitrou pitrou merged commit 30610d2 into python:main Jun 9, 2022
@miss-islington
Copy link
Contributor

Thanks @jxdabc for the PR, and @pitrou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Jun 9, 2022
@bedevere-bot
Copy link

GH-93651 is a backport of this pull request to the 3.11 branch.

@bedevere-bot
Copy link

GH-93652 is a backport of this pull request to the 3.10 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants