TST: Add direct tests for join_inner in table/_np_utils Cython extension#19458
TST: Add direct tests for join_inner in table/_np_utils Cython extension#19458ReemHamraz wants to merge 17 commits intoastropy:mainfrom
Conversation
Add test_np_utils_extension.py, which imports the compiled _np_utils extension module directly (without going through astropy.table.Table or astropy.table.join) and exercises join_inner() across all four join types (inner/outer/left/right) with hand-constructed index inputs. Also add test_np_utils.py with tests for common_dtype() in np_utils.py. This is part of the work described in astropy#19249 to build a standalone test layer for astropy's compiled extension modules.
|
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
|
pre-commit.ci autofix |
a401338 to
14b0a45
Compare
dddc9ec to
8c730c9
Compare
|
pre-commit.ci autofix |
c54b6e7 to
8dcadba
Compare
843ba46 to
dad7647
Compare
Expands the standalone test coverage for the Cython extension in . Previously, tests only verified the structural output (e.g., and the boolean flag). This commit adds strict assertions using to validate the actual contents of the computed , , , and arrays across all four join types (inner, outer, left, right). Additionally, adds a function to ensure the C-extension handles zero-length inputs gracefully without segfaulting.
3180166 to
656cbd4
Compare
1a9e5af to
a7d9b4f
Compare
There was a problem hiding this comment.
This looks like a great start. I think we need to iterate a couple times to reduce in-between tests duplication though. I'd suggest moving as much tests to a general class (not bound to any particular jointype) and parametrize it over jointypes
as it makes sense. I'll then re-review in more depth. From a relatively quick gaze, tests look sensible and minimal, good job !
Co-authored-by: Clément Robert <cr52@protonmail.com>
Co-authored-by: Clément Robert <cr52@protonmail.com>
|
@neutrinoceros I have refactored the test suite based on your feedback so like I condensed all the four classes into a single
Also please lemme know what you think of this |
neutrinoceros
left a comment
There was a problem hiding this comment.
thanks, it looks much better now !
I think all your tests are sensible, but they all focus on equal-size left/right inputs. I think we should also include more diverse test cases like
- inputs of different sizes
- inputs with 0 elements (left, right and both)
I'm also curious what happens with non-integer dtypes ? in particular, what happens if there's a NaN in left and right ? I'm asking because NaN doesn't compare equal to itself, so it could break the helper function in interesting ways.
I actually tried adding empty array tests previously, but it caused the test suite to instantly fail with a Segmentation Fault, and since the public Python API in |
536e763 to
b0d5d90
Compare
28ef2b3 to
0a17f0a
Compare
Indeed, I see now what this happens. I wonder if raising an exception is actually the correct behavior, I would naively assume that at least some join types could still logically accept empty tables... Any way, happy to declare this out of scope for the present PR. |
neutrinoceros
left a comment
There was a problem hiding this comment.
we're in the late stage review;
A high level remark is that it's a bit hard to read parametrizations when there are so many parameters in a single pytest.param.
One way to address this would be to define a couple small dataclasses to hold the information as:
from dataclasses import dataclass
Array = np.ndarray[tuple[int], np.dtype[np.float64]]
Mask = np.ndarray[tuple[int], np.dtype[np.bool_]]
@dataclass(kw_only=True, slots=True, frozen=True)
class ArrayMaskPair:
array: Array
mask: Mask
@dataclass(kw_only=True, slots=True, frozen=True)
class ExpectedResults:
left: ArrayMaskPair
right: ArrayMaskPair|
I intend my next review to be the final round. I'll be focusing on wether all tests are actually correct, but it'll be easier to do once my current set of suggestions are addressed. |
7511d46 to
87fa88a
Compare
|
|
||
| @dataclass(kw_only=True, slots=True, frozen=True) | ||
| class ArrayMaskPair: | ||
| array: list | np.ndarray |
There was a problem hiding this comment.
Nice, this single type annotation has at least 3 flaws that we can use to up your skills !
list and np.ndarray are both generic types; in simple terms, this means they define the shape of the container, but additional information is needed to type their content (for instance list[float]).
np.ndarray isn't supported in numpy as a type annotation. It's quite a bit more involved to type these correctly, especially in cases where multiple dimensionalities and/or dtypes are possible, but to give a concise example of how to declare the type of a 2D array of single precision floats:
np.ndarray[tuple[int, int], np.dtype[np.float32]]
Moreover, type unions (|) are generally to be avoided when possible (because they make the whole exercise both more difficult and less impactful). They should not be needed here: I'm sure we can decide on a single type and use it unconditionally (probably arrays).
There was a problem hiding this comment.
To be clear: none of this matters at runtime. You could use array: int and the tests would run just the same, but that's not a reason not to make type annotations actually useful at type-check time !
|
|
||
| assert n_out == 3 | ||
| assert masked == 0 | ||
| np.testing.assert_array_equal(left_out, [0, 1, 2]) |
There was a problem hiding this comment.
we usually just import this function at the top level
Part of GSoC 2026 proposal: Hardening Astropy's Core Stability
This PR adds direct tests for the
join_innerfunction inastropy/table/_np_utils.pyx, bypassing the publicastropy.table.join()API to test the compiled Cython interface directly.join_inner()is the single function exposed by this extension and is the core index-computation engine for all table join operations in astropy. It currently has zero standalone test coverage, it is only exercised indirectly through the public API.Tests cover
len(left_out) == n_outfor all join types)len(left_mask) == n_outfor all join types)The helper
_make_join_inputs()reproduces the exact index-preparation thatoperations.pyperforms before callingjoin_inner(), keeping the tests fully self-contained with no dependency on the public API.