-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: Allow convert_dtypes to convert to pd.ArrowDtype #50094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow convert_dtypes to convert to pd.ArrowDtype #50094
Conversation
jrbourbeau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice @mroeschke!
I definitely think the io.nullable_backend -> mode.nullable_backend change makes sense given it's being generalized outside of just read_* I/O functions.
| def test_pyarrow_nullable_backend_already_pyarrow(self): | ||
| pytest.importorskip("pyarrow") | ||
| expected = pd.DataFrame([1, 2, 3], dtype="int64[pyarrow]") | ||
| with pd.option_context("mode.nullable_backend", "pyarrow"): | ||
| result = expected.convert_dtypes() | ||
| tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does setting mode.nullable_backend to "pyarrow" work when the original dtypes are already numpy-backed extension dtypes? For example, something like
expected = pd.DataFrame([1, 2, 3], dtype="Int64")
with pd.option_context("mode.nullable_backend", "pyarrow"):
result = expected.convert_dtypes()
tm.assert_frame_equal(result, expected)or adding a df.convert_dtypes() line before with pd.option_context("mode.nullable_backend", "pyarrow"): in test_pyarrow_nullable_backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup this should convert the numpy-backed types to pyarrow-backed types. Will add a unit test for it
| infer_objects : bool, defaults False | ||
| Whether to also infer objects to float/int if possible. Is only hit if the | ||
| object array contains pd.NA. | ||
| nullable_backend : str, default "pandas" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with the mode.nullable_backend option? My guess is folks would expect nullable_backend to take precedent if it's specified, but for mode.nullable_backend to take precedent if nullable_backend isn't given.
Here's a tiny example of the use case where I'm asking what happens:
with pd.option_context("mode.nullable_backend", "pyarrow"):
result = expected.convert_dtypes(nullable_backend="pandas")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line specifically is a private helper function that backs the public Series/DataFrame.convert_dtypes.
The public Series/DataFrame.convert_dtypes will not accept a keyword argument nullable_backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see -- thanks for clarifying. So the nullable_backend config option is how public APIs toggle here. Good to know.
phofl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
convert_dtypesto convert topd.ArrowDtype#49997 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.Additionally changed
io.nullable_backendtomode.nullable_backendto make it more general