Skip to content

Conversation

@phlogistonjohn
Copy link
Contributor

@phlogistonjohn phlogistonjohn commented Apr 24, 2025

Based on #61737
Fixes (workaround): https://tracker.ceph.com/issues/64213

This PR takes some of the work done by @pecastro along with suggestions by @epuertat and combines them into a series of patches that create two implementations with a common interface (via ABCs) for calling basic cryptographic and TSL/SSL related functionality. This interface is named CryptoCaller.

The new 'remote' implementation uses subprocesses to execute the cryptography functions outside of the mgr. This avoids conflicts between the pyo3 using libs (specifically cryptography) across sub-interpreters. The 'internal' implementation uses the library calls directly but does so through a class based interface. A helper module allows mgr code to choose or get the currently enabled (per sub-interpreter) crypto caller implementation.

The mgr utils are updated to use this interface. The cephadm module uses the internal implemenation. The dashboard defaults to the remote implementation but can be configured.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@phlogistonjohn
Copy link
Contributor Author

https://pulpito.ceph.com/adking-2025-04-29_23:34:48-orch:cephadm-jjm-wip-pycry-remoted-distro-default-smithi/
This test run with this PR applied + a "tainted" container image ran many tests successfully.

@phlogistonjohn phlogistonjohn changed the title [WIP] mgr:python: avoid pyo3 errors by running certain crypgraphic functions in a child process mgr:python: avoid pyo3 errors by running certain crypgraphic functions in a child process Apr 30, 2025
@phlogistonjohn
Copy link
Contributor Author

I'm ready for initial reviews: PTAL @epuertat @pecastro @adk3798

Feel free to ignore the experimental module shield commit for now, I will remove it when I take this out of draft, ideally early next week.

@pecastro
Copy link
Contributor

LGTM.

I just have a question around the code duplication, I can see it defaulting to the remote option, which is the forked subprocess. Great.
How are you planning on exercising the internal branch to make sure that other piece of code doesn't bitrot ?

@phlogistonjohn
Copy link
Contributor Author

LGTM.

I just have a question around the code duplication, I can see it defaulting to the remote option, which is the forked subprocess. Great. How are you planning on exercising the internal branch to make sure that other piece of code doesn't bitrot ?

Good Q: The remote default impacts the dashboard. The cephadm module defaults to using internal both modules make use of a number of the crypto caller functions. Functional testing (teuthology) that uses both modules should get us fair coverage of both options.

The unit testing story is a bit more complicated with the extremely heavy use of mocks that mgr module units tests use. In that case I think it's good enough to let all code use the default as that's the more complex path. That said, we could probably add a set of dedicated "unit" tests to the new dir for focused testing on the new interface and backend implementations. I will look into that - but perhaps in a follow up PR.

@phlogistonjohn
Copy link
Contributor Author

One thing I meant to note in my previous comment: when I mention that the remote path is the complex one be aware that the remote crypto caller eventually calls the internal crypto caller - so every time the remote caller is used the internal caller gets used too (in the child process). It's just indirect.

@phlogistonjohn
Copy link
Contributor Author

jenkins test make check arm64

@phlogistonjohn
Copy link
Contributor Author

@epuertat - I am ready to take this out of draft but I'd like a general "thumbs up/down" from you before I do so. I don't need a full review at the moment just a quick check on the overall approach.

@phlogistonjohn
Copy link
Contributor Author

@epuertat
Copy link
Member

epuertat commented May 6, 2025

@epuertat - I am ready to take this out of draft but I'd like a general "thumbs up/down" from you before I do so. I don't need a full review at the moment just a quick check on the overall approach.

Looks good to me, @phlogistonjohn !

When I saw the "remote" term over there, I remembered that someone (@batrick) suggested to have these crypto methods in a new helper-only mgr module, and use the mgr.remote() method to invoke them. It'd be a middle ground between the sub-interpreter and the stand-alone interpreter approaches.

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I saw the "remote" term over there, I remembered that someone (@batrick) suggested to have these crypto methods in a new helper-only mgr module, and use the mgr.remote() method to invoke them. It'd be a middle ground between the sub-interpreter and the stand-alone interpreter approaches.

Thanks for the ping @epuertat . Note I don't think we typically fork in Ceph processes so this may run afoul of restrictive container settings like NPROC=1. Something to be on the lookout for. A pending release note would be appropriate.

# sharing the same protocol.
# For instance we could have a persistent process listening on a unix
# socket accepting the crypto functions as RPCs. For now, we keep it
# simple though :-)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

name = os.environ.get(_CC_ENV, '')
_check_name(name)
if not name:
name = _CC_REMOTE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to attempt an internal import and fallback to remote?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, debug messages for these branches would be a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug sounds good. Unfortunately, an attempt to import the module in the "wrong" subintepreter that succeeds will pollute the system. IOW, if we import the dashboard and it probes the cryptography module and it succeeds it will then break a subsequent import of cephadm. So it's not really "safe" to try to import it proactively.



def main() -> None:
# create the top-level parser
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a generic mechanism for this. It has to be somewhat common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you are referring to here, setting up an argument parser?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean a python module that is providing RPCs over a socket / pipe.

@phlogistonjohn
Copy link
Contributor Author

The idea of moving the use of cryptography into a central mgr module was discussed in a ceph orch weekly. One of the issues we discussed was that cephadm "needs cryptography the most" so that it can use asyncssh. Thus we'd probably need to do it in a less clean way with cephadm handling the crypto calls - rather than a nice central crypto module that does one thing assisting the other modules.

Of course, all of this is a hopefully temporary workaround once the pyo3 team does something on their side. I kept thinking that with python 3.13 they were going to expose subintepreters to the api but I guess that never went all the way (the module exists in 3.13 but is _interpreters so not totally official). So my hope that would motivate the pyo3 devs is probably irrelevant now.

@phlogistonjohn phlogistonjohn marked this pull request as ready for review May 8, 2025 17:34
@phlogistonjohn phlogistonjohn requested review from a team as code owners May 8, 2025 17:34
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Add a module to select a desired crypto caller. Update the callers
to use the crypto caller interface.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
Previously, the internal crypto caller would catch (and convert) some
errors when reading the cert but not all cases. Move the logic to catch
the errors to a common location and do it once consistently.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
The cephadm modules needs to use python cryptography module for ssh (via
asyncssh) and thus there's no need to use the remote crypto caller in
cephadm. Configure cephadm to always use the internal cryptocaller.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
Add a mgr config option `crypto_caller` that lets a ceph user override
the default behavior of using the remote crypto caller. Supported
values are `internal` and `remote`.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
@phlogistonjohn phlogistonjohn changed the title mgr:python: avoid pyo3 errors by running certain crypgraphic functions in a child process mgr:python: avoid pyo3 errors by running certain cryptographic functions in a child process Jul 14, 2025
@adk3798
Copy link
Contributor

adk3798 commented Jul 17, 2025

@adk3798 adk3798 merged commit 7094a5a into ceph:main Jul 17, 2025
18 of 19 checks passed
@github-project-automation github-project-automation bot moved this from Reviewer approved to Done in Ceph-Dashboard Jul 17, 2025
@phlogistonjohn phlogistonjohn deleted the jjm-pe-crypto-plus branch July 23, 2025 21:04
ProxBot pushed a commit to proxmox/ceph that referenced this pull request Jul 28, 2025
These patches were taken from PR #62951 [0] and backported onto
upstream tag "19.2.2" (0eceb0defba).

This provides a workaround for the PyO3 ImportError regarding
sub-interpreters for the Ceph Dashboard.

[0]: ceph/ceph#62951

Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
Link: https://lore.proxmox.com/20250715093237.650039-2-m.carrara@proxmox.com
ProxBot pushed a commit to proxmox/ceph that referenced this pull request Jul 28, 2025
These patches were taken from PR #62951 [0] and backported onto
upstream tag "19.2.2" (0eceb0defba).

This provides a workaround for the PyO3 ImportError regarding
sub-interpreters for the Ceph Dashboard.

[0]: ceph/ceph#62951

Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
Link: https://lore.proxmox.com/20250715093237.650039-2-m.carrara@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
ProxBot pushed a commit to proxmox/ceph that referenced this pull request Jul 28, 2025
By moving the self-signed cert generation into a separate module and
consequently running it via `subprocess.run`--so, as a new,
independent Python process--the PyO3 ImportError is successfully
avoided.

Inspired by an upstream PR [0].

[0]: ceph/ceph#62951

Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
Link: https://lore.proxmox.com/20250716173956.980112-2-m.carrara@proxmox.com
@nizamial09
Copy link
Member

@phlogistonjohn shouldn't we backport this to tentacle as well?

@phlogistonjohn
Copy link
Contributor Author

This has been in main for over a month and I haven't heard of any regressions related to these changes, so sure we can probably backport it to tentacle.

Aequitosh added a commit to Aequitosh/ceph that referenced this pull request Sep 12, 2025
Move the self-signed cert generation into a separate module
inside python-common/ceph and run the module in a separate Python
process.

This provides a workaround for the ImportError thrown by PyO3 when
the `restful` module is loaded in the context of multiple Python
sub-interpreters being present. In particular, the ImportError is
thrown by the `crypto` module of the `OpenSSL` package.

Inspired by an upstream PR [0].

[0]: ceph#62951

Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
@rsommer
Copy link
Contributor

rsommer commented Sep 30, 2025

Will this be backported to reef or at least squid, as these versions are affected as well? It is currently blocking updates to reef/squid on Debian-based systems.

@bazaah
Copy link
Contributor

bazaah commented Oct 2, 2025

I'm playing around with a cherry-picked squash of this branch for v19/Archlinux, the merge conflicts weren't so tricky, but I want to test it out a bit. I can then submit a proper backport for v19 if there's interest from the Ceph side.

bazaah added a commit to bazaah/aur-ceph that referenced this pull request Oct 5, 2025
This is a backport of ceph/ceph#62951, rebased onto v19.2.3.
All credit for this work goes to @phlogistonjohn.

Upstream-ref: ceph/ceph@7094a5a
References: ceph/ceph#62951
@qiuxinyidian
Copy link
Contributor

@phlogistonjohn hi, should pybind/mgr/cephadm/ssl_cert_utils.py use this interface ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

10 participants