BUG: displaying string dtypes not showing storage option #50151

phofl · 2022-12-09T14:28:00Z

closes BUG: DataFrame.dtypes doesn't include backend for string columns #50099 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @mroeschke not saying that this is a good fix, but this shows that we have to use repr for the string dtypes in some way. Is there a more general rule to accomplish this?

mroeschke · 2022-12-12T18:40:45Z

Looks to be a doctest error

=================================== FAILURES ===================================
_____________ [doctest] pandas.core.generic.NDFrame.convert_dtypes _____________
6466         Convert the DataFrame to use best possible dtypes.
6467 
6468         >>> dfn = df.convert_dtypes()
6469         >>> dfn
6470            a  b      c     d     e      f
6471         0  1  x   True     h    10   <NA>
6472         1  2  y  False     i  <NA>  100.5
6473         2  3  z   <NA>  <NA>    20  200.0
6474 
6475         >>> dfn.dtypes
Differences (unified diff with -expected +actual):
    @@ -1,7 +1,7 @@
    -a      Int32
    -b     string
    -c    boolean
    -d     string
    -e      Int64
    -f    Float64
    +a             Int32
    +b    string[python]
    +c           boolean
    +d    string[python]
    +e             Int64
    +f           Float64
     dtype: object

mroeschke · 2022-12-12T18:42:19Z

pandas/tests/io/formats/test_to_string.py

+def test_to_string_string_dtype():
+    # GH#50099
+    if pa_version_under6p0:
+        pytest.skip()


Nit: Could you use the pytest.mark.skipif decorator?

good point, changed

mroeschke · 2022-12-12T18:47:17Z

Is there a more general rule to accomplish this?

Yeah I don't think at the moment. Not sure why StringDtype (before StorageDtype was refactored away) has this str vs repr difference.

Looks like I actually hacked around this when creating ArrowDtype lol

class StorageExtensionDtype(ExtensionDtype):
    """ExtensionDtype that may be backed by more than one implementation."""

    name: str
    _metadata = ("storage",)

    def __init__(self, storage=None) -> None:
        self.storage = storage

    def __repr__(self) -> str:
        return f"{self.name}[{self.storage}]"

    def __str__(self) -> str:
        return self.name

class ArrowDtype(StorageExtensionDtype):
    def __repr__(self) -> str:
        return self.name

    @property
    def name(self) -> str:  # type: ignore[override]
        """
        A string identifying the data type.
        """
        return f"{str(self.pyarrow_dtype)}[{self.storage}]"

phofl · 2022-12-12T19:13:48Z

Thx, fixed the doctest.

Ok then can keep as is :)

mroeschke

LGTM can merge on green

jrbourbeau

Thanks @phofl!

jorisvandenbossche · 2022-12-13T21:45:24Z

Not sure why StringDtype (before StorageDtype was refactored away) has this str vs repr difference.

As far as I remember, this was actually done intentionally. And I personally I would prefer to keep this. But let's discuss on the issue (will reopen that one).

BUG: displaying string dtypes not showing storage option

0aa1454

phofl added Bug Output-Formatting __repr__ of pandas objects, to_string Strings String extension data type and string data labels Dec 9, 2022

phofl added 2 commits December 9, 2022 18:00

Skip when no pyarrow

befb05f

Merge remote-tracking branch 'upstream/main' into 50099

b6355c8

mroeschke reviewed Dec 12, 2022

View reviewed changes

Address review

e345684

Merge branch 'main' into 50099

aad2a5a

mroeschke added this to the 2.0 milestone Dec 13, 2022

mroeschke approved these changes Dec 13, 2022

View reviewed changes

phofl merged commit d050408 into pandas-dev:main Dec 13, 2022

jrbourbeau reviewed Dec 13, 2022

View reviewed changes

jorisvandenbossche mentioned this pull request Dec 13, 2022

BUG: DataFrame.dtypes doesn't include backend for string columns #50099

Open

3 tasks

phofl deleted the 50099 branch December 13, 2022 22:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: displaying string dtypes not showing storage option #50151

BUG: displaying string dtypes not showing storage option #50151

Uh oh!

phofl commented Dec 9, 2022

Uh oh!

mroeschke commented Dec 12, 2022

Uh oh!

mroeschke Dec 12, 2022

Uh oh!

phofl Dec 12, 2022

Uh oh!

mroeschke commented Dec 12, 2022

Uh oh!

phofl commented Dec 12, 2022

Uh oh!

mroeschke left a comment

Uh oh!

jrbourbeau left a comment

Uh oh!

jorisvandenbossche commented Dec 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

BUG: displaying string dtypes not showing storage option #50151

BUG: displaying string dtypes not showing storage option #50151

Uh oh!

Conversation

phofl commented Dec 9, 2022

Uh oh!

mroeschke commented Dec 12, 2022

Uh oh!

mroeschke Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

phofl Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Dec 12, 2022

Uh oh!

phofl commented Dec 12, 2022

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Dec 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants