-
Notifications
You must be signed in to change notification settings - Fork 6.2k
mgr:python: avoid pyo3 errors by running certain cryptographic functions in a child process #62951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a2721b3 to
008d669
Compare
|
https://pulpito.ceph.com/adking-2025-04-29_23:34:48-orch:cephadm-jjm-wip-pycry-remoted-distro-default-smithi/ |
|
LGTM. I just have a question around the code duplication, I can see it defaulting to the remote option, which is the forked subprocess. Great. |
Good Q: The The unit testing story is a bit more complicated with the extremely heavy use of mocks that mgr module units tests use. In that case I think it's good enough to let all code use the default as that's the more complex path. That said, we could probably add a set of dedicated "unit" tests to the new dir for focused testing on the new interface and backend implementations. I will look into that - but perhaps in a follow up PR. |
|
One thing I meant to note in my previous comment: when I mention that the |
008d669 to
13e31c9
Compare
|
jenkins test make check arm64 |
|
@epuertat - I am ready to take this out of draft but I'd like a general "thumbs up/down" from you before I do so. I don't need a full review at the moment just a quick check on the overall approach. |
|
Teuthology run results - default base OS packages: |
Looks good to me, @phlogistonjohn ! When I saw the "remote" term over there, I remembered that someone (@batrick) suggested to have these crypto methods in a new helper-only mgr module, and use the |
batrick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I saw the "remote" term over there, I remembered that someone (@batrick) suggested to have these crypto methods in a new helper-only mgr module, and use the mgr.remote() method to invoke them. It'd be a middle ground between the sub-interpreter and the stand-alone interpreter approaches.
Thanks for the ping @epuertat . Note I don't think we typically fork in Ceph processes so this may run afoul of restrictive container settings like NPROC=1. Something to be on the lookout for. A pending release note would be appropriate.
| # sharing the same protocol. | ||
| # For instance we could have a persistent process listening on a unix | ||
| # socket accepting the crypto functions as RPCs. For now, we keep it | ||
| # simple though :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| name = os.environ.get(_CC_ENV, '') | ||
| _check_name(name) | ||
| if not name: | ||
| name = _CC_REMOTE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not possible to attempt an internal import and fallback to remote?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, debug messages for these branches would be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug sounds good. Unfortunately, an attempt to import the module in the "wrong" subintepreter that succeeds will pollute the system. IOW, if we import the dashboard and it probes the cryptography module and it succeeds it will then break a subsequent import of cephadm. So it's not really "safe" to try to import it proactively.
|
|
||
|
|
||
| def main() -> None: | ||
| # create the top-level parser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a generic mechanism for this. It has to be somewhat common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you are referring to here, setting up an argument parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean a python module that is providing RPCs over a socket / pipe.
|
The idea of moving the use of cryptography into a central mgr module was discussed in a ceph orch weekly. One of the issues we discussed was that cephadm "needs cryptography the most" so that it can use asyncssh. Thus we'd probably need to do it in a less clean way with cephadm handling the crypto calls - rather than a nice central crypto module that does one thing assisting the other modules. Of course, all of this is a hopefully temporary workaround once the pyo3 team does something on their side. I kept thinking that with python 3.13 they were going to expose subintepreters to the api but I guess that never went all the way (the module exists in 3.13 but is |
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Signed-off-by: John Mulligan <jmulligan@redhat.com>
0c4b10c to
b1903da
Compare
Add a module to select a desired crypto caller. Update the callers to use the crypto caller interface. Signed-off-by: John Mulligan <jmulligan@redhat.com>
Previously, the internal crypto caller would catch (and convert) some errors when reading the cert but not all cases. Move the logic to catch the errors to a common location and do it once consistently. Signed-off-by: John Mulligan <jmulligan@redhat.com>
The cephadm modules needs to use python cryptography module for ssh (via asyncssh) and thus there's no need to use the remote crypto caller in cephadm. Configure cephadm to always use the internal cryptocaller. Signed-off-by: John Mulligan <jmulligan@redhat.com>
Add a mgr config option `crypto_caller` that lets a ceph user override the default behavior of using the remote crypto caller. Supported values are `internal` and `remote`. Signed-off-by: John Mulligan <jmulligan@redhat.com>
b1903da to
27c2050
Compare
|
no special failures I can see related to pyo3 stuff |
These patches were taken from PR #62951 [0] and backported onto upstream tag "19.2.2" (0eceb0defba). This provides a workaround for the PyO3 ImportError regarding sub-interpreters for the Ceph Dashboard. [0]: ceph/ceph#62951 Signed-off-by: Max R. Carrara <m.carrara@proxmox.com> Link: https://lore.proxmox.com/20250715093237.650039-2-m.carrara@proxmox.com
These patches were taken from PR #62951 [0] and backported onto upstream tag "19.2.2" (0eceb0defba). This provides a workaround for the PyO3 ImportError regarding sub-interpreters for the Ceph Dashboard. [0]: ceph/ceph#62951 Signed-off-by: Max R. Carrara <m.carrara@proxmox.com> Link: https://lore.proxmox.com/20250715093237.650039-2-m.carrara@proxmox.com Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
By moving the self-signed cert generation into a separate module and consequently running it via `subprocess.run`--so, as a new, independent Python process--the PyO3 ImportError is successfully avoided. Inspired by an upstream PR [0]. [0]: ceph/ceph#62951 Signed-off-by: Max R. Carrara <m.carrara@proxmox.com> Link: https://lore.proxmox.com/20250716173956.980112-2-m.carrara@proxmox.com
|
@phlogistonjohn shouldn't we backport this to tentacle as well? |
|
This has been in main for over a month and I haven't heard of any regressions related to these changes, so sure we can probably backport it to tentacle. |
Move the self-signed cert generation into a separate module inside python-common/ceph and run the module in a separate Python process. This provides a workaround for the ImportError thrown by PyO3 when the `restful` module is loaded in the context of multiple Python sub-interpreters being present. In particular, the ImportError is thrown by the `crypto` module of the `OpenSSL` package. Inspired by an upstream PR [0]. [0]: ceph#62951 Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
|
Will this be backported to reef or at least squid, as these versions are affected as well? It is currently blocking updates to reef/squid on Debian-based systems. |
|
I'm playing around with a cherry-picked squash of this branch for v19/Archlinux, the merge conflicts weren't so tricky, but I want to test it out a bit. I can then submit a proper backport for v19 if there's interest from the Ceph side. |
This is a backport of ceph/ceph#62951, rebased onto v19.2.3. All credit for this work goes to @phlogistonjohn. Upstream-ref: ceph/ceph@7094a5a References: ceph/ceph#62951
|
@phlogistonjohn hi, should pybind/mgr/cephadm/ssl_cert_utils.py use this interface ? |
Based on #61737
Fixes (workaround): https://tracker.ceph.com/issues/64213
This PR takes some of the work done by @pecastro along with suggestions by @epuertat and combines them into a series of patches that create two implementations with a common interface (via ABCs) for calling basic cryptographic and TSL/SSL related functionality. This interface is named CryptoCaller.
The new 'remote' implementation uses subprocesses to execute the cryptography functions outside of the mgr. This avoids conflicts between the pyo3 using libs (specifically cryptography) across sub-interpreters. The 'internal' implementation uses the library calls directly but does so through a class based interface. A helper module allows mgr code to choose or get the currently enabled (per sub-interpreter) crypto caller implementation.
The mgr utils are updated to use this interface. The cephadm module uses the internal implemenation. The dashboard defaults to the remote implementation but can be configured.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition