Skip to content

Conversation

@SwayamInSync
Copy link
Member

Follow up of #29763

As per the suggestions, the workaround in this PR is just modifying the getlimits.py and user-defined data type must implement the finfo() method.

cc: @ngoldbaum @seberg @mhvk

Comment on lines +598 to +604
self._str_tiny = str(finfo_obj.get("smallest_normal"))
self._str_max = str(self.max)
self._str_epsneg = str(self.epsneg)
self._str_eps = str(self.eps)
self._str_resolution = str(self.resolution)
self._str_smallest_normal = str(finfo_obj.get("smallest_normal"))
self._str_smallest_subnormal = str(self.smallest_subnormal)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be the string "None" if not present. I don't think that's what we want here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes thanks for catching this, I will update this to have None (like others)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make any of these fields None-able, actually. Because if we do, then that would make this a backwards incompatible change. For example, consider this function:

import numpy as np

def half_epsilon[T: np.inexact](dtype: type[T] | np.dtype[T]) -> T:
    return np.finfo(dtype).eps / 2

Currently this works fine for all inexact dtypes. But with this change, valid input types could have finfo.eps: None, resulting in a TypeError.

So downstream libraries will have to make changes in code like this so that it handles the case each of the possible None cases.

I think it would be better to require all of the attributes to be defined, and of the same type as the np.finfo attributes. Put differently; this should stay the way it is:

numpy/numpy/__init__.pyi

Lines 5796 to 5812 in 0aa8087

class finfo(Generic[_FloatingT_co]):
dtype: Final[dtype[_FloatingT_co]]
bits: Final[int]
eps: Final[_FloatingT_co]
epsneg: Final[_FloatingT_co]
iexp: Final[int]
machep: Final[int]
max: Final[_FloatingT_co]
maxexp: Final[int]
min: Final[_FloatingT_co]
minexp: Final[int]
negep: Final[int]
nexp: Final[int]
nmant: Final[int]
precision: Final[int]
resolution: Final[_FloatingT_co]
smallest_subnormal: Final[_FloatingT_co]

I think that's a fair requirement. Because if your user dtype doesn't have one of these attributes, then it shouldn't even be an inexact subtype in the first place, as that would violate the Liskov substitution principle.

Comment on lines +581 to +597
self.dtype = numeric.dtype(dtype)
self.precision = finfo_obj.get('precision')
self.iexp = finfo_obj.get('iexp')
self.maxexp = finfo_obj.get('maxexp')
self.minexp = finfo_obj.get('minexp')
self.negep = finfo_obj.get('negep')
self.machep = finfo_obj.get('machep')
self.resolution = finfo_obj.get('resolution')
self.epsneg = finfo_obj.get('epsneg')
self.smallest_subnormal = finfo_obj.get('smallest_subnormal')
self.bits = self.dtype.itemsize * 8
self.max = finfo_obj.get('max')
self.min = finfo_obj.get('min')
self.eps = finfo_obj.get('eps')
self.nexp = finfo_obj.get('nexp')
self.nmant = finfo_obj.get('nmant')
self._machar = finfo_obj.get('machar')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effect of this is that each one of these finfo attributes can now also be None. This would require users to add if ... is None in every case where one of these attributes is used.
That makes this a backwards incompatible change. It would also require updating the stubs to reflect this:

numpy/numpy/__init__.pyi

Lines 5796 to 5822 in 5f80045

class finfo(Generic[_FloatingT_co]):
dtype: Final[dtype[_FloatingT_co]]
bits: Final[int]
eps: Final[_FloatingT_co]
epsneg: Final[_FloatingT_co]
iexp: Final[int]
machep: Final[int]
max: Final[_FloatingT_co]
maxexp: Final[int]
min: Final[_FloatingT_co]
minexp: Final[int]
negep: Final[int]
nexp: Final[int]
nmant: Final[int]
precision: Final[int]
resolution: Final[_FloatingT_co]
smallest_subnormal: Final[_FloatingT_co]
@property
def smallest_normal(self) -> _FloatingT_co: ...
@property
def tiny(self) -> _FloatingT_co: ...
@overload
def __new__(cls, dtype: inexact[_NBit1] | _DTypeLike[inexact[_NBit1]]) -> finfo[floating[_NBit1]]: ...
@overload
def __new__(cls, dtype: complex | type[complex]) -> finfo[float64]: ...
@overload
def __new__(cls, dtype: str) -> finfo[floating]: ...

So I think it's better to "fail fast", and raise an appropriate error if one of these keys isn't present in finfo_obj.

self._str_smallest_subnormal = machar._str_smallest_subnormal.strip()
return self

def _init_from_finfo_obj(self, dtype, finfo_obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looking at the code, it looks like finfo_obj is supposed to be a collections.abc.Mapping (dict-like) type, is that right?

Either way, this is rather important assumption that should be clearly documented (as well as any other implicit assumption for that matter).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be dict-like yes. In quaddtype I implemented this as dataclass with its own get method (just for syntax familiarity of returning None when attribute not present)
numpy/numpy-user-dtypes#159

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, something flexible like that cannot be properly annotate, because then we wouldn't be able to use a TypedDict.

Comment on lines +519 to +523
if hasattr(dtype, "finfo") and issubclass(dtype.type, numeric.inexact):
# User-defined dtype with finfo support
finfo_obj = dtype.finfo()
obj = object.__new__(cls)._init_from_finfo_obj(dtype, finfo_obj)
return obj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stubs should also be updated to accept these kinds of (structural) types:

numpy/numpy/__init__.pyi

Lines 5817 to 5822 in 5f80045

@overload
def __new__(cls, dtype: inexact[_NBit1] | _DTypeLike[inexact[_NBit1]]) -> finfo[floating[_NBit1]]: ...
@overload
def __new__(cls, dtype: complex | type[complex]) -> finfo[float64]: ...
@overload
def __new__(cls, dtype: str) -> finfo[floating]: ...

For that, you'd have to write a typing.Protocol that implements a finfo method and a type property (not an attribute) that returns a generic _T: np.inexact, since we want to return type to be annotated as np.finfo[_T]. The finfo method should (presumably) return a typing.TypedDict.
If you didn't get any of what I just said; then that's normal. If that's the case, then I'd be happy to help.

@seberg
Copy link
Member

seberg commented Sep 20, 2025

Sorry I had not thought about this deeply (and I still fear I may not be quite settling in thoughts).
My thinking always went in two opposing directions, one

  1. should we move more towards C?
  2. should we just make get_finfo return the finfo object (and make it a "python side" problem).

This PR does 2 pretty nicely, with one caveat I am not sure about. What should dtype.finfo return (I would say it should be a property)? A dict feels wrong, a dataclass/protocol (without further typing) might be great, though.
One detail about this (and the previous PR) is that I somewhat think we should make the finfo object look up attributes lazily. So if a value cannot be defined that only affects things at lookup time.

The direction of going to C, I think would look a bit different and given your previous PR, I think the API I would go towards is:

// Yes slots, I am still adamant that slots are correct for something like
// finfo that is deeply tied to the dtype/NumPy (opposed to registering it differently).
// NOTE: I am intentionally making these two different slots, because it allows NumPy
// to add the `.finfo` attribute depending on whether or not the finfo/iinfo slot is used.
NPY_DT_get_constant

typedef enum {
    NPY_CONST_zero,  // yes, we are missing this :).
    NPY_CONST_finfo_max,  // maybe no ffinfo_, but need to be clear it is "max not inf"
    NPY_CONST_finfo_smallest_normal,
    ...
} NPY_CONSTANT_REQUEST;

static int 
MY_DTYPE_get_constant(PyArray_Descr *self, NPY_CONSTANT_REQUEST request, void *data)
{
    // data guaranteed aligned?  You can actually use the request ID as an
    // offset into a struct too (since the IDs are guaranteed stable).
    switch (request) {
       case NPY_FINFO_max:
           *(*my_type)data = value;  // or memcpy(data, &value, sizeof(value)) if unaligned is acceptable (no strong opinion from me)
           return 0;
       case ...:

       default:
           // not sure also with free-threading, I think I'd say we
           // can hold the GIL and just set an error, but we could also
           // say that a -2 return just means "undefined" without an error.
    }
}

This would give us a fast and simple way to query this on the C-side (that we could also expose as public C-API while keeping the definition of the Python side .finfo attribute value to NumPy (meaning it is guaranteed to look always the same).

One reason I like this, is that we really could use a way to fetch zero easily for example.

Now, that doesn't actually allow us to add the .finfo attribute magically :). But that part could be a very lightweight superclass that does that.
For convenience, I would be also very happy to just add a an INEXACT flag so and let NumPy internals add the superclass (or just the .finfo attribute) as we want to.

EDIT: Ahh, but a lot of the constants here are integers. Maybe that is fine, we just have to define them to be npy_intp/Py_ssize_t?

@SwayamInSync
Copy link
Member Author

Thanks @seberg for checking it out

  1. should we move more towards C?
  2. should we just make get_finfo return the finfo object (and make it a "python side" problem).

So in the previous PR the discussion was closed as for now (just for the sake of availability) we can register this in python side and for a complete packed solution, we need to plan thoroughly for the hierarchy of the numeric dtypes, implementing abstract Super classes (that user defined dytpes can inherit, which also allows them to access methods like finfo, iinfo, and other generic metadata)
My naive opinion is still maybe that just using slots is fine

that should dtype.finfo return (I would say it should be a property)? A dict feels wrong, a dataclass/protocol (without further typing) might be great, though.

In quaddtype we are doing it as a dataclass, although one point that the finfo properties like max, eps they should be of the type of user-defined dtype (for eg, in our case it should be of QuadPrecDType() considering them to float can give rounding errors) @jorenham do we need to update this as well for stubs (more generic as user dtype can be anything)?

Now, that doesn't actually allow us to add the .finfo attribute magically :). But that part could be a very lightweight superclass that does that.
For convenience, I would be also very happy to just add a an INEXACT flag so and let NumPy internals add the superclass (or just the .finfo attribute) as we want to.

Yes or we can also register .tp_methods for this

@seberg
Copy link
Member

seberg commented Sep 20, 2025

Yes or we can also register .tp_methods for this

Right, but we can write a generic version in NumPy that initializes np.finfo (the biggest annoyance is that we need to change np.finfo a bit more maybe). Then dtypes don't have to do it by themselves and we can't diverge.

I thought about it a bit more and came to this public API for your side. Might be missing a few values, but I doubt they are important and many values in finfo can be calculated from others so it would be missing few things, I think. (See details, mild modifications from the above, like I think maybe just guaranteeing GIL is good for this -- nobody needs all constants, you can normally query them ahead of time, IMO.)

I really like this, because it also allows a bit saner handling for ufunc initial values. They are already adjustable on a per-loop basis, but this might be a great way to get to a default that is basically always right, and saner than what we have right now.

Details
/*
 * Constants that can be queried and used e.g. by reduce identies defaults.
 * These are also used to expose .finfo and .iinfo for example.
 */
// Not sure about the first ones, but feels like "why not"
#define NPY_CONSTANT_empty_fill 0  // for "needs init" and if it isn't just 0.
#define NPY_CONSTANT_zero_fill 1  // we have a zero filling loop, so not sure.
#define NPY_CONSTANT_ones_fill 2  // np.ones() value  (may not be numerically 1)
/* Numerical constants */
#define NPY_CONSTANT_zero 3
#define NPY_CONSTANT_one 4
// I forgot negative zero, it does make sense here.
#define NPY_CONSTANT_finite_max 5
#define NPY_CONSTANT_finite_min 6
#define NPY_CONSTANT_inf 7
#define NPY_CONSTANT_ninf 8
#define NPY_CONSTANT_nan 9
#define NPY_CONSTANT_finfo_eps 10
#define NPY_CONSTANT_finfo_epsneg 9
#define NPY_CONSTANT_finfo_smallest_normal 12
#define NPY_CONSTANT_finfo_smallest_subnormal 13
/* Constants that are always of integer type, value is `npy_intp/Py_ssize_t` */
#define NPY_CONSTANT_finfo_nmant 14
#define NPY_CONSTANT_finfo_nexp 15
/* It may make sense to continue with other constants here, e.g. pi, etc? */

/*
 * Function to get a constant value for the dtype.  Data may be unaligned, the
 * function is always called with the GIL held.
 *
 * @param descr The dtype instance (i.e. self)
 * @param ID The ID of the constant to get.
 * @param data Pointer to the data to get the constant from
 * @returns 1 on success, 0 if the constant is not available and -1 with an error set.
 */
typedef int (PyArrayDTypeMeta_GetConstant)(PyArray_Descr *descr, int ID, void *data);

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on the implementation.

It does feel more logical for dtype.finfo to be a property, and it seems to me it might as well return an object with attributes rather than a mapping. Of course, that means that effectively it just returns what for all practical purposes is a finfo instance. So, maybe it should just be a finfo instance, perhaps one defined in C, with slots...

p.s. I still like the idea that only inexact dtype have the finfo, though perhaps the C implementation can simply ensure that if it is accessed on the python side but not used, an AttributeError is raised.

self._str_smallest_subnormal = machar._str_smallest_subnormal.strip()
return self

def _init_from_finfo_obj(self, dtype, finfo_obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, this really is a class method, so I'd do

@classmethod
def _from_dtype_finfo(cls, dtype):
    self = object.__new__(cls):
    finfo = dtype.finfo()  # Would remove trailing "_obj".
    self.dtype = ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not! It should be a method, we should support e.g. mpf floats.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mhvk here: constructor functions should be classmethods.

@seberg I'm not sure what you're talking about here (even though I know mpmath)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry mindslip, thought this was somethig else :/


def _init_from_finfo_obj(self, dtype, finfo_obj):
self.dtype = numeric.dtype(dtype)
self.precision = finfo_obj.get('precision')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd tend to make this a loop, as is done in _init:

for attr in ("precision", ....):
    setattr(self, attr, finfo.get(attr))

Though it does suggest that perhaps finfo should be something that has attributes instead of a mapping... (again like is done in _init -- indeed, with a bit of refactoring of _get_machar, it may be possible to factor out some of the _init code for and use it here).

if hasattr(dtype, "finfo") and issubclass(dtype.type, numeric.inexact):
# User-defined dtype with finfo support
finfo_obj = dtype.finfo()
obj = object.__new__(cls)._init_from_finfo_obj(dtype, finfo_obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the class method suggested below, this becomes return cls._from_dtype_finfo(dtype)

# User-defined dtype with finfo support
finfo_obj = dtype.finfo()
obj = object.__new__(cls)._init_from_finfo_obj(dtype, finfo_obj)
return obj
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be put in _finfo_cache?

@mhvk
Copy link
Contributor

mhvk commented Sep 20, 2025

@seberg - I like the slots idea! But why the difference between zero_fill and zero? (etc.)

@seberg
Copy link
Member

seberg commented Sep 21, 2025

Really for np.ones which isn't the neutral element of multiplication for strings so we have two uses for one. So I thought the distinction may be best to just make sure code can fetch a matnematically valid 0/1.

EDIT: To be fair, an explicit empty/ones function may be better for these, but maybe it's still good, if just to clarify the distinction.

@jorenham
Copy link
Member

In quaddtype we are doing it as a dataclass, although one point that the finfo properties like max, eps they should be of the type of user-defined dtype (for eg, in our case it should be of QuadPrecDType() considering them to float can give rounding errors) @jorenham do we need to update this as well for stubs (more generic as user dtype can be anything)?

QuadDType will be a dtype[np.floating], so its scalar type will subclass np.floating, right? Because if so, then there would be no need to update the stubs for that, since it already accepts those (np.inexact even)

@juntyr
Copy link

juntyr commented Sep 21, 2025

At least in the finfo docstring, it is mentioned that many props are floats, (

eps : float
). Ideally, they should always be of the dtype.type and also be typed that way.

@jorenham
Copy link
Member

At least in the finfo docstring, it is mentioned that many props are floats, (

eps : float

). Ideally, they should always be of the dtype.type and also be typed that way.

the docs are wrong then:

>>> import numpy as np
>>> f = np.finfo(np.float32)
>>> f.eps, f.epsneg, f.resolution, f.smallest_subnormal, f.max, f.min
(np.float32(1.1920929e-07), np.float32(5.9604645e-08), np.float32(1e-06), np.float32(1e-45), np.float32(3.4028235e+38), np.float32(-3.4028235e+38))

Note that the stubs annotate this correctly

@mhvk
Copy link
Contributor

mhvk commented Sep 21, 2025

Really for np.ones which isn't the neutral element of multiplication for strings so we have two uses for one. So I thought the distinction may be best to just make sure code can fetch a matnematically valid 0/1.

Ah, yes, most useful is the version that in multiplication with itself stays identical, but then for strings that just doesn't exist, so it would be empty. Arguably, one doesn't really need another version, but you're right that one might as well support the use in np.ones just for backward compatibility.

@seberg
Copy link
Member

seberg commented Sep 21, 2025

Actually, I meandered back: We already have a zero's special case and that is really the one interesting one (empty is just for objects right now, and ones is just a cast).

I would like to use this for simple identities in ufuncs -- it looks like it might be slightly messier due to Python objects (but not sure).
Anyway, I think this can work out. Defining these for all of our dtypes is a bit tedious considering that it may be a bit underutilized for the moment, but probably not too terrible.

@SwayamInSync
Copy link
Member Author

Okay so I belive we are confidently leaning towards the C slots API with some baseclass to register corresponding finfo method right @seberg ?
Please give a heads up or confirmation to this and I can start towards the refactoring of this PR itself.

One other point on the lazy lookup is that, this way the finfo cache would be filled as per the user's lookup pattern (unlike the current which loads everything in once) which I don't think would be an issue, other side it can resolve the issue of none-able quantities, by giving undefined error

@seberg
Copy link
Member

seberg commented Sep 21, 2025

I had looked at doing this a bit today and got to: #29778

It basically works, certainly needs polishing/tests. And I am happy if you just use that to make a new PR with my start.
And I really like that it fully deletes all of that crazy python side finfo code!

Unfortunately, there is some fallout, just because I wanted to use the new _getitem() dtype slot just to return scalars...
That leads to some fallout and I am not even quite sure why yet! The simple work-around would be to create a temporary array in the finfo population function, then we don't have to do that part!

@SwayamInSync I doubt I have a lot of time this week, but if you like don't hesitate to take over. (And e.g. undo the _getitem/_setitem changes and just create that temporary array within populate_..., which should be straight forward.

@SwayamInSync
Copy link
Member Author

Thanks @seberg I'll take it from there and happy to see machar logic gone.

@charris
Copy link
Member

charris commented Sep 21, 2025

close/reopen

@charris charris closed this Sep 21, 2025
@charris charris reopened this Sep 21, 2025
@SwayamInSync
Copy link
Member Author

Closing this one as we made the good progress in #29836

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants