@QuangHoang has suggested an easy way with query in the comments. Another option, with a third party library- pyjanitor - allows for multiple selections on a MultiIndex with a dictionary, using the select method. I am a contributor to the library:
# pip install pyjanitor
import pandas as pd
import janitor
df.select(columns=[{'Sample':'A', 'T':0.2}, {'Sample':'C', 'T':0.5}])
Sample A C
P1 1 5
P2 2 6
T 0.2 0.5
0 0.240285 0.966722
1 0.537845 0.605961
2 0.581185 0.680213
3 0.504225 0.158793
4 0.673914 0.020176
5 0.528920 0.616641
6 0.189066 0.612055
7 0.766254 0.161062
8 0.077120 0.593749
9 0.922863 0.217441
As shown in the example code above, the key of the dictionary is the level in the MultiIndex, while the value is the actual label you wish to select.
Another option is to extract the indices with a boolean array, before selecting with loc. The advantage of this is clarity - you know exactly what you are selecting, and you have more control:
cond1=(df.columns.get_level_values('Sample')=='A') & (df.columns.get_level_values('T')==0.2)
cond2=(df.columns.get_level_values('Sample')=='C') & (df.columns.get_level_values('T')==0.5)
indexer=df.columns[cond1|cond2]
indexer
MultiIndex([('A', 1, 2, 0.2),
('C', 5, 6, 0.5)],
names=['Sample', 'P1', 'P2', 'T'])
df.loc[:, indexer]
Sample A C
P1 1 5
P2 2 6
T 0.2 0.5
0 0.240285 0.966722
1 0.537845 0.605961
2 0.581185 0.680213
3 0.504225 0.158793
4 0.673914 0.020176
5 0.528920 0.616641
6 0.189066 0.612055
7 0.766254 0.161062
8 0.077120 0.593749
9 0.922863 0.217441
Another option is a combination pandas' xs method and concat function:
pd.concat([df.xs(key=0.2, level='T', axis=1, drop_level=False), df.xs(key=0.5,level='T',axis=1, drop_level=False)],axis=1)
Out[9]:
Sample A C
P1 1 5
P2 2 6
T 0.2 0.5
0 0.427995 0.257755
1 0.520026 0.370271
2 0.280932 0.402916
3 0.901517 0.337662
4 0.179267 0.399471
... ... ...
199995 0.620268 0.215945
199996 0.420360 0.544633
199997 0.793744 0.153501
199998 0.253700 0.588854
199999 0.201630 0.177926
[200000 rows x 2 columns]
a look at performance of the various solutions:
# let's increase size of the dataframe
In [54]: df = pd.concat([df]*20_000, ignore_index=True)
In [55]: df.shape
Out[55]: (200000, 6)
# manual form
In [56]: %%timeit
...: cond1=(df.columns.get_level_values('Sample')=='A') & (df.columns.get_level_values('T')==0.2)
...: cond2=(df.columns.get_level_values('Sample')=='C') & (df.columns.get_level_values('T')==0.5)
...: indexer=df.columns[cond1|cond2]
...: df.loc[:, indexer]
...:
...:
680 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# third party library - pyjanitor
In [57]: %timeit df.select(columns=[{'Sample':'A', 'T':0.2}, {'Sample':'C', 'T':0.5}])
603 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# panda kim's solution
In [58]: %timeit df.loc[:, df.columns.droplevel([1, 2]).isin([('A', 0.2), ('C', 0.5)])]
753 µs ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# xs + concat
%timeit pd.concat([df.xs(key=0.2, level='T', axis=1, drop_level=False), df.xs(key=0.5,level='T',axis=1, drop_level=False)],axis=1)
1.03 ms ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# query seems expensive - not sure why
In [59]: %timeit df.T.query('(Sample == "A" & T==0.2) or (Sample == "C" & T==0.5)').T
109 ms ± 634 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
queryis very handy:df.T.query('(Sample == "A" & T==0.2) or (Sample == "C" & T==0.5)').Tqueryis not a command I have used so far, but this answer is intuitive and clean, except for the need to transpose. Floats as index values are a fact of life.