-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
ENH: Making np.finfo accept finfo coming from user-defined data type
#29771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| self._str_tiny = str(finfo_obj.get("smallest_normal")) | ||
| self._str_max = str(self.max) | ||
| self._str_epsneg = str(self.epsneg) | ||
| self._str_eps = str(self.eps) | ||
| self._str_resolution = str(self.resolution) | ||
| self._str_smallest_normal = str(finfo_obj.get("smallest_normal")) | ||
| self._str_smallest_subnormal = str(self.smallest_subnormal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will be the string "None" if not present. I don't think that's what we want here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes thanks for catching this, I will update this to have None (like others)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should make any of these fields None-able, actually. Because if we do, then that would make this a backwards incompatible change. For example, consider this function:
import numpy as np
def half_epsilon[T: np.inexact](dtype: type[T] | np.dtype[T]) -> T:
return np.finfo(dtype).eps / 2Currently this works fine for all inexact dtypes. But with this change, valid input types could have finfo.eps: None, resulting in a TypeError.
So downstream libraries will have to make changes in code like this so that it handles the case each of the possible None cases.
I think it would be better to require all of the attributes to be defined, and of the same type as the np.finfo attributes. Put differently; this should stay the way it is:
Lines 5796 to 5812 in 0aa8087
| class finfo(Generic[_FloatingT_co]): | |
| dtype: Final[dtype[_FloatingT_co]] | |
| bits: Final[int] | |
| eps: Final[_FloatingT_co] | |
| epsneg: Final[_FloatingT_co] | |
| iexp: Final[int] | |
| machep: Final[int] | |
| max: Final[_FloatingT_co] | |
| maxexp: Final[int] | |
| min: Final[_FloatingT_co] | |
| minexp: Final[int] | |
| negep: Final[int] | |
| nexp: Final[int] | |
| nmant: Final[int] | |
| precision: Final[int] | |
| resolution: Final[_FloatingT_co] | |
| smallest_subnormal: Final[_FloatingT_co] |
I think that's a fair requirement. Because if your user dtype doesn't have one of these attributes, then it shouldn't even be an
inexact subtype in the first place, as that would violate the Liskov substitution principle.
| self.dtype = numeric.dtype(dtype) | ||
| self.precision = finfo_obj.get('precision') | ||
| self.iexp = finfo_obj.get('iexp') | ||
| self.maxexp = finfo_obj.get('maxexp') | ||
| self.minexp = finfo_obj.get('minexp') | ||
| self.negep = finfo_obj.get('negep') | ||
| self.machep = finfo_obj.get('machep') | ||
| self.resolution = finfo_obj.get('resolution') | ||
| self.epsneg = finfo_obj.get('epsneg') | ||
| self.smallest_subnormal = finfo_obj.get('smallest_subnormal') | ||
| self.bits = self.dtype.itemsize * 8 | ||
| self.max = finfo_obj.get('max') | ||
| self.min = finfo_obj.get('min') | ||
| self.eps = finfo_obj.get('eps') | ||
| self.nexp = finfo_obj.get('nexp') | ||
| self.nmant = finfo_obj.get('nmant') | ||
| self._machar = finfo_obj.get('machar') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The effect of this is that each one of these finfo attributes can now also be None. This would require users to add if ... is None in every case where one of these attributes is used.
That makes this a backwards incompatible change. It would also require updating the stubs to reflect this:
Lines 5796 to 5822 in 5f80045
| class finfo(Generic[_FloatingT_co]): | |
| dtype: Final[dtype[_FloatingT_co]] | |
| bits: Final[int] | |
| eps: Final[_FloatingT_co] | |
| epsneg: Final[_FloatingT_co] | |
| iexp: Final[int] | |
| machep: Final[int] | |
| max: Final[_FloatingT_co] | |
| maxexp: Final[int] | |
| min: Final[_FloatingT_co] | |
| minexp: Final[int] | |
| negep: Final[int] | |
| nexp: Final[int] | |
| nmant: Final[int] | |
| precision: Final[int] | |
| resolution: Final[_FloatingT_co] | |
| smallest_subnormal: Final[_FloatingT_co] | |
| @property | |
| def smallest_normal(self) -> _FloatingT_co: ... | |
| @property | |
| def tiny(self) -> _FloatingT_co: ... | |
| @overload | |
| def __new__(cls, dtype: inexact[_NBit1] | _DTypeLike[inexact[_NBit1]]) -> finfo[floating[_NBit1]]: ... | |
| @overload | |
| def __new__(cls, dtype: complex | type[complex]) -> finfo[float64]: ... | |
| @overload | |
| def __new__(cls, dtype: str) -> finfo[floating]: ... |
So I think it's better to "fail fast", and raise an appropriate error if one of these keys isn't present in finfo_obj.
| self._str_smallest_subnormal = machar._str_smallest_subnormal.strip() | ||
| return self | ||
|
|
||
| def _init_from_finfo_obj(self, dtype, finfo_obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at the code, it looks like finfo_obj is supposed to be a collections.abc.Mapping (dict-like) type, is that right?
Either way, this is rather important assumption that should be clearly documented (as well as any other implicit assumption for that matter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be dict-like yes. In quaddtype I implemented this as dataclass with its own get method (just for syntax familiarity of returning None when attribute not present)
numpy/numpy-user-dtypes#159
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, something flexible like that cannot be properly annotate, because then we wouldn't be able to use a TypedDict.
| if hasattr(dtype, "finfo") and issubclass(dtype.type, numeric.inexact): | ||
| # User-defined dtype with finfo support | ||
| finfo_obj = dtype.finfo() | ||
| obj = object.__new__(cls)._init_from_finfo_obj(dtype, finfo_obj) | ||
| return obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stubs should also be updated to accept these kinds of (structural) types:
Lines 5817 to 5822 in 5f80045
| @overload | |
| def __new__(cls, dtype: inexact[_NBit1] | _DTypeLike[inexact[_NBit1]]) -> finfo[floating[_NBit1]]: ... | |
| @overload | |
| def __new__(cls, dtype: complex | type[complex]) -> finfo[float64]: ... | |
| @overload | |
| def __new__(cls, dtype: str) -> finfo[floating]: ... |
For that, you'd have to write a typing.Protocol that implements a finfo method and a type property (not an attribute) that returns a generic _T: np.inexact, since we want to return type to be annotated as np.finfo[_T]. The finfo method should (presumably) return a typing.TypedDict.
If you didn't get any of what I just said; then that's normal. If that's the case, then I'd be happy to help.
|
Sorry I had not thought about this deeply (and I still fear I may not be quite settling in thoughts).
This PR does 2 pretty nicely, with one caveat I am not sure about. What should The direction of going to C, I think would look a bit different and given your previous PR, I think the API I would go towards is: This would give us a fast and simple way to query this on the C-side (that we could also expose as public C-API while keeping the definition of the Python side One reason I like this, is that we really could use a way to fetch zero easily for example. Now, that doesn't actually allow us to add the EDIT: Ahh, but a lot of the constants here are integers. Maybe that is fine, we just have to define them to be |
|
Thanks @seberg for checking it out
So in the previous PR the discussion was closed as for now (just for the sake of availability) we can register this in python side and for a complete packed solution, we need to plan thoroughly for the hierarchy of the numeric dtypes, implementing abstract Super classes (that user defined dytpes can inherit, which also allows them to access methods like
In quaddtype we are doing it as a dataclass, although one point that the
Yes or we can also register |
Right, but we can write a generic version in NumPy that initializes I thought about it a bit more and came to this public API for your side. Might be missing a few values, but I doubt they are important and many values in finfo can be calculated from others so it would be missing few things, I think. (See details, mild modifications from the above, like I think maybe just guaranteeing GIL is good for this -- nobody needs all constants, you can normally query them ahead of time, IMO.) I really like this, because it also allows a bit saner handling for ufunc initial values. They are already adjustable on a per-loop basis, but this might be a great way to get to a default that is basically always right, and saner than what we have right now. Details |
mhvk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments on the implementation.
It does feel more logical for dtype.finfo to be a property, and it seems to me it might as well return an object with attributes rather than a mapping. Of course, that means that effectively it just returns what for all practical purposes is a finfo instance. So, maybe it should just be a finfo instance, perhaps one defined in C, with slots...
p.s. I still like the idea that only inexact dtype have the finfo, though perhaps the C implementation can simply ensure that if it is accessed on the python side but not used, an AttributeError is raised.
| self._str_smallest_subnormal = machar._str_smallest_subnormal.strip() | ||
| return self | ||
|
|
||
| def _init_from_finfo_obj(self, dtype, finfo_obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, this really is a class method, so I'd do
@classmethod
def _from_dtype_finfo(cls, dtype):
self = object.__new__(cls):
finfo = dtype.finfo() # Would remove trailing "_obj".
self.dtype = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not! It should be a method, we should support e.g. mpf floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry mindslip, thought this was somethig else :/
|
|
||
| def _init_from_finfo_obj(self, dtype, finfo_obj): | ||
| self.dtype = numeric.dtype(dtype) | ||
| self.precision = finfo_obj.get('precision') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd tend to make this a loop, as is done in _init:
for attr in ("precision", ....):
setattr(self, attr, finfo.get(attr))
Though it does suggest that perhaps finfo should be something that has attributes instead of a mapping... (again like is done in _init -- indeed, with a bit of refactoring of _get_machar, it may be possible to factor out some of the _init code for and use it here).
| if hasattr(dtype, "finfo") and issubclass(dtype.type, numeric.inexact): | ||
| # User-defined dtype with finfo support | ||
| finfo_obj = dtype.finfo() | ||
| obj = object.__new__(cls)._init_from_finfo_obj(dtype, finfo_obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the class method suggested below, this becomes return cls._from_dtype_finfo(dtype)
| # User-defined dtype with finfo support | ||
| finfo_obj = dtype.finfo() | ||
| obj = object.__new__(cls)._init_from_finfo_obj(dtype, finfo_obj) | ||
| return obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be put in _finfo_cache?
|
@seberg - I like the slots idea! But why the difference between zero_fill and zero? (etc.) |
|
Really for EDIT: To be fair, an explicit empty/ones function may be better for these, but maybe it's still good, if just to clarify the distinction. |
|
|
At least in the finfo docstring, it is mentioned that many props are floats, ( numpy/numpy/_core/getlimits.py Line 400 in 0532af4
dtype.type and also be typed that way.
|
the docs are wrong then: >>> import numpy as np
>>> f = np.finfo(np.float32)
>>> f.eps, f.epsneg, f.resolution, f.smallest_subnormal, f.max, f.min
(np.float32(1.1920929e-07), np.float32(5.9604645e-08), np.float32(1e-06), np.float32(1e-45), np.float32(3.4028235e+38), np.float32(-3.4028235e+38))Note that the stubs annotate this correctly |
Ah, yes, most useful is the version that in multiplication with itself stays identical, but then for strings that just doesn't exist, so it would be empty. Arguably, one doesn't really need another version, but you're right that one might as well support the use in |
|
Actually, I meandered back: We already have a zero's special case and that is really the one interesting one (empty is just for objects right now, and ones is just a cast). I would like to use this for simple identities in ufuncs -- it looks like it might be slightly messier due to Python objects (but not sure). |
|
Okay so I belive we are confidently leaning towards the C slots API with some baseclass to register corresponding One other point on the lazy lookup is that, this way the finfo cache would be filled as per the user's lookup pattern (unlike the current which loads everything in once) which I don't think would be an issue, other side it can resolve the issue of none-able quantities, by giving undefined error |
|
I had looked at doing this a bit today and got to: #29778 It basically works, certainly needs polishing/tests. And I am happy if you just use that to make a new PR with my start. Unfortunately, there is some fallout, just because I wanted to use the new @SwayamInSync I doubt I have a lot of time this week, but if you like don't hesitate to take over. (And e.g. undo the |
|
Thanks @seberg I'll take it from there and happy to see |
|
close/reopen |
|
Closing this one as we made the good progress in #29836 |
Follow up of #29763
As per the suggestions, the workaround in this PR is just modifying the
getlimits.pyand user-defined data type must implement thefinfo()method.cc: @ngoldbaum @seberg @mhvk