3

I have a dataframe with multi index as follows

arrays = [
    ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
    ["one", "two", "one", "two", "one", "two", "one", "two"],
]
tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])

s = pd.DataFrame(np.random.randn(8), index=index).T

which looks like this

                    bar                      baz                   foo                      qux
          one       two          one         two         one       two          one         two
0   -0.144135   0.625481    -2.139184   -1.066893   -0.123791   -1.058165   0.495627    -0.654353

to which the documentation says to index in the following way

df.loc[:, (slice("bar", "two"), ...)]

and so I do

s.loc[:, (slice("bar", "two"):(slice("baz", "two"))]

which gives me a SyntaxError.

  Cell In[98], line 3
    s.loc[:, (slice("bar", "two"):(slice("baz", "two")))]
                                 ^
SyntaxError: invalid syntax

In my specific use-case [albeit beyond the scope of this question], the level 1 indices are of type timestamp [Year], but I figure the answer should be the same. What is the proper way to access a range of multi-indexed items via a multi-index column?

1
  • You are using slice incorrectly, slice syntax in python has a start,stop and optional step positional arguments Commented Oct 19, 2024 at 22:10

2 Answers 2

3

If you want to get the data from bar two to baz two, the following code works.

s.loc[:, ("bar", "two"):("baz", "two")]

The result looks like this:

first            bar                      baz
second           two          one         two
     0      0.625481    -2.139184   -1.066893
Sign up to request clarification or add additional context in comments.

Comments

2

As per the documentation, you have a few options to return this slice:

Option 1: hierarchical index using tuples (docs section)

(See also answer by @Koki.)

s.loc[:, ('bar', 'two'):('baz', 'two')]

Here we reference start (('bar', 'two')) and stop simply by tuples (('baz', 'two')) with the colon (:) in between to create a range between the specified columns.

Option 2: using slicers (docs section, cf. slice)

s.loc[:, slice(('bar', 'two'), ('baz', 'two'))]

The signature is slice(start, stop[, step]), so that ('bar', 'two') gets passed as start and ('baz', 'two') as stop.

Option 3: using pd.IndexSlice

idx = pd.IndexSlice
s.loc[:, idx['bar', 'two']:idx['baz', 'two']]

Similar to option 1: start + : + stop.


All three of these result in:

# using `np.random.seed(0)` for reproducibility

first        bar       baz          
second       two       one       two
0       0.400157  0.978738  2.240893

2 Comments

This is amazingly helpful. I thought pd.IndexSlice was being deprecated? no?
You're welcome. pd.IndexSlice is not being deprecated. If it were, this would certainly be mentioned in release notes and/or in the official documentation, and it is not.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.