multi index with .loc on columns

Question

I have a dataframe with multi index as follows

arrays = [
    ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
    ["one", "two", "one", "two", "one", "two", "one", "two"],
]
tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])

s = pd.DataFrame(np.random.randn(8), index=index).T

which looks like this

                    bar                      baz                   foo                      qux
          one       two          one         two         one       two          one         two
0   -0.144135   0.625481    -2.139184   -1.066893   -0.123791   -1.058165   0.495627    -0.654353

to which the documentation says to index in the following way

df.loc[:, (slice("bar", "two"), ...)]

and so I do

s.loc[:, (slice("bar", "two"):(slice("baz", "two"))]

which gives me a SyntaxError.

  Cell In[98], line 3
    s.loc[:, (slice("bar", "two"):(slice("baz", "two")))]
                                 ^
SyntaxError: invalid syntax

In my specific use-case [albeit beyond the scope of this question], the level 1 indices are of type timestamp [Year], but I figure the answer should be the same. What is the proper way to access a range of multi-indexed items via a multi-index column?

You are using slice incorrectly, slice syntax in python has a start,stop and optional step positional arguments — sammywemmy
– sammywemmy, Commented Oct 19, 2024 at 22:10

Koki · Accepted Answer · 2024-10-19 22:09:51Z

3

If you want to get the data from bar two to baz two, the following code works.

s.loc[:, ("bar", "two"):("baz", "two")]

The result looks like this:

first            bar                      baz
second           two          one         two
     0      0.625481    -2.139184   -1.066893

edited Oct 19, 2024 at 22:09

answered Oct 19, 2024 at 22:03

Koki

1842 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ouroboros1 · Accepted Answer · 2024-10-21 06:11:30Z

2

As per the documentation, you have a few options to return this slice:

Option 1: hierarchical index using tuples (docs section)

(See also answer by @Koki.)

s.loc[:, ('bar', 'two'):('baz', 'two')]

Here we reference start (('bar', 'two')) and stop simply by tuples (('baz', 'two')) with the colon (:) in between to create a range between the specified columns.

Option 2: using slicers (docs section, cf. slice)

s.loc[:, slice(('bar', 'two'), ('baz', 'two'))]

The signature is slice(start, stop[, step]), so that ('bar', 'two') gets passed as start and ('baz', 'two') as stop.

Option 3: using pd.IndexSlice

idx = pd.IndexSlice
s.loc[:, idx['bar', 'two']:idx['baz', 'two']]

Similar to option 1: start + : + stop.

All three of these result in:

# using `np.random.seed(0)` for reproducibility

first        bar       baz          
second       two       one       two
0       0.400157  0.978738  2.240893

edited Oct 21, 2024 at 6:11

answered Oct 19, 2024 at 22:47

ouroboros1

15.2k7 gold badges49 silver badges59 bronze badges

2 Comments

plotmaster473 Over a year ago

This is amazingly helpful. I thought pd.IndexSlice was being deprecated? no?

ouroboros1 Over a year ago

You're welcome. pd.IndexSlice is not being deprecated. If it were, this would certainly be mentioned in release notes and/or in the official documentation, and it is not.

Collectives™ on Stack Overflow

multi index with .loc on columns

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related