Skip to content

Conversation

@yqyqyq-W
Copy link
Contributor

@yqyqyq-W yqyqyq-W commented Nov 15, 2022

@yqyqyq-W yqyqyq-W changed the title Bug: Fix Inconsistent Behavior for Series.searchsorted Bug: Fix Inconsistent Behavior for Series.searchsorted #49620 Nov 15, 2022
@yqyqyq-W yqyqyq-W changed the title Bug: Fix Inconsistent Behavior for Series.searchsorted #49620 Bug: Fix Inconsistent Behavior for Series.searchsorted Nov 15, 2022
@yqyqyq-W yqyqyq-W marked this pull request as ready for review November 15, 2022 18:54
sorter: NumpySorter = None,
) -> npt.NDArray[np.intp] | np.intp:

if isinstance(value, pd.DataFrame):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check with ABCDataFrame

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the usage of ABCDataFrame. Does that mean I should use isinstance(value, ABCDataFrame) instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. at the top do from pandas.core.dtypes.generic import ABCDataFrame

@jbrockmendel
Copy link
Member

needs test

@yqyqyq-W yqyqyq-W marked this pull request as draft November 16, 2022 01:38
@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas Series Series data structure labels Nov 16, 2022
@yqyqyq-W yqyqyq-W marked this pull request as ready for review November 16, 2022 19:39

if isinstance(value, ABCDataFrame):
msg = (
"Value must be array-like or scalar, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add "1D" here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added 1-D. In fact, n-D object except DataFrame works well with this function. As long as we don't call pd.array(n-d DataFrame).

@jbrockmendel
Copy link
Member

there's another subtle bug you've uncovered

>>> df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
>>> pd.array(df)
<StringArray>
[
['[1 3]', '[1 3]'],
['[2 4]', '[2 4]']
]
Shape: (2, 2), dtype: string

pd.array(df) should probably raise, maybe return pd.array(df._values). Definitely not this StringArray.

exp = np.array([0, 2], dtype=np.intp)
tm.assert_numpy_array_equal(res, exp)

def test_searchsorted_Dataframe_fail(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you de-capitalize "DataFrame" here, and add a comment # GH#49620 on the nex tline

@yqyqyq-W yqyqyq-W marked this pull request as draft November 21, 2022 03:39
@yqyqyq-W
Copy link
Contributor Author

yqyqyq-W commented Nov 21, 2022

That's exactly the real problem. The reason why DataFrame should be forbidden here is that type infer in pd.array doesn't work on n-D DataFrame. But such a fix is far beyond my ability.

@yqyqyq-W yqyqyq-W marked this pull request as ready for review November 21, 2022 06:18
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 22, 2022
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Jan 4, 2023
@jbrockmendel
Copy link
Member

@mroeschke i think this might be fine as-is; i can do a separate PR to fix pd.array

@jbrockmendel jbrockmendel reopened this Jan 5, 2023
@mroeschke mroeschke added this to the 2.0 milestone Jan 6, 2023
@mroeschke mroeschke merged commit c29736a into pandas-dev:main Jan 6, 2023
@mroeschke
Copy link
Member

Thanks @yqyqyq-W

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Error Reporting Incorrect or improved errors from pandas Series Series data structure Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Series.searchsorted(...) fails with Timestamp DataFrames

3 participants