Skip to content

ENH: Expose original func as PipeFunc.__call__ #910

Draft
LennartGevers wants to merge 4 commits intopipefunc:mainfrom
LennartGevers:clean-decorator-behavior
Draft

ENH: Expose original func as PipeFunc.__call__ #910
LennartGevers wants to merge 4 commits intopipefunc:mainfrom
LennartGevers:clean-decorator-behavior

Conversation

@LennartGevers
Copy link
Contributor

@LennartGevers LennartGevers commented Oct 26, 2025

Hi @basnijholt,
I gave it a shot to make the @pipefunc decorator less invasive, so that the annotations and docs are not removed, as mentioned in #902.

  1. Exposing the annotations is quite easy if done correctly with ParamSpec for the input annotations and a normal TypeVar for the output annotation.

  2. Exposing the documentation is a bit subtle and I currently see two different approaches here.

    In the case of

    @pipefunc("text")
    def function(a: int, b: str) -> str:
        """Desc.
        Args:
            a (int): _description_
            b (str): _description_
        Returns:
            str: _description_
        """
        return f"number {a}" + b
    
    
    function(1, "test")

    if the documentation should be readable when hovering the mouse over "function" on the last line, then the output annotation of the pipefunc function needs to be

    def pipefunc(...)-> Callable[[Callable[P, R]], Callable[P, R]]:

    but in that case you will get a type checking error for things like

    function.update_bounds(...)   # Cannot access attribute "update_bounds" for class "FunctionType". Attribute "update_bounds" is unknown

    on the other hand, if we use the annotation

    def pipefunc(...)-> Callable[[Callable[P, R]], PipeFunc[P, R]]:

    then

    function.update_scope(...) # Works as expected

    but then the documentation will only appear when the cursor is within the brackets (idk how to explain this, just check the demo.py file I pushed). I think that this would always be the standard behavior for PipeFuncs instantiated with PipeFunc.__init__.

I just wanted to push this draft for a quick review before diving into the subsequent changes that would still be needed so that this PipeFunc implementation also works within a Pipeline.

@basnijholt
Copy link
Collaborator

Thank you for trying to fix this. I fully agree that if we could get this working, we should merge it.

@LennartGevers LennartGevers force-pushed the clean-decorator-behavior branch from 37efdf7 to 71aee6f Compare October 28, 2025 21:43
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 28, 2025

CodSpeed Performance Report

Merging #910 will not alter performance

Comparing LennartGevers:clean-decorator-behavior (e98f882) with main (355584e)

Summary

✅ 6 untouched

@github-actions
Copy link
Contributor

✅ PR Title Formatted Correctly

The title of this PR has been updated to match the correct format. Thank you!

@LennartGevers
Copy link
Contributor Author

LennartGevers commented Oct 29, 2025

Hey @basnijholt,

this branch is ~90% there, but I think that the current solution is inferior to an alternative but more intricate design that’s worth discussing.

The main issue is that once you start using renames, defaults, or bound, the function’s signature changes at runtime and Python’s ParamSpec can’t express that. So we can’t properly expose the updated signature to type checkers or IDEs.

Example:

@pipefunc("f", update_defaults={"b": 0.5})
def f(a: int, b: float = 2) -> int:
    return a**b

f(4)      # 16
f.run(4)  # 2

This works, but it feels weird that f and f.run behave differently. The concept of having a Pipeline context behavior and "normal" behavior does not feel intuitive to me. But..., as soon as signature-modifying args are involved, we lose the ability to type things statically, meaning that this initially look like the only option.

I think the key insight is that there are really two kinds of attributes:
• ones that don’t change the signature (output_name, mapspec, etc.), and
• ones that do (renames, defaults, bound, maybe scope).

Right now they’re all handled in the same class, which complicates typing and behavior.

So my idea is to split them:
Func: lightweight, immutable, only stores pipeline metadata (like a frozen dataclass) with no signature magic.
PipeFunc: wraps a Func and adds the dynamic behavior (renames, defaults, etc.).

@func would be the minimal decorator that keeps IDE hovers and types perfect, while @pipefunc(...) or PipeFunc(...) gives you the full power and mutability. As soon as a Func is passed to a Pipeline it is promoted to a PipeFunc so that it can respond to Pipeline.add_mapspec_axis and so on.

We could alternatively overload pipefunc so it returns a Func when no signature-modifying args are given, and a PipeFunc otherwise.

@overload
def pipefunc(..., renames=None, defaults=None, bound=None) -> Func: ...
@overload
def pipefunc(..., renames=..., defaults=..., bound=...) -> PipeFunc: ...

But to be really honest, I think that a function which effectively has stateful behaviour is an anti-pattern. This should be reserved exclusively for OOP. So maybe we should only advocate instantiating PipeFuncs with its constructor. I think the sum of all of these behaviours (side effects, mutability, no signature and docs, ...) is why people often prefer to instantiate a PipeFunc with its constructor anyways.

Differentiating the Func and PipeFunc layer would separate concerns nicely, simplify the internals, and avoid weird side effects from mutating a live PipeFunc, like

@pipefunc("f",)
def f(a: int, b: float = 2) -> int:
    return a**b

h = f
h.update_defaults({"b": 0.5})
f(4) #2

Would do you think about that? @basnijholt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants