1

I have a dataframe :

dfs = pd.read_csv(StringIO("""
      datetime        ID  C_1 C_2  C_3   C_4 C_5 C_6
"18/06/2023 3:51:50"  136 101 2024  89    4   3   13
"18/06/2023 3:51:52"  136 101 2028  61    4   3   18
"18/06/2023 3:51:53"  24  101 2029  65    0   0   0
"18/06/2023 3:51:53"  24  102 2022  89    0   0   0
"18/06/2023 3:51:54"  136 102 2045  66    2   3   4
"18/06/2023 3:51:55"  0   101 2022  89    0   0   0
"18/06/2023 3:51:56"  136 101 2222  77    0   0   0
"18/06/2023 3:51:56"  24  102 2022  89    0   0   0
"18/06/2023 3:51:57"  136 101 2024  90    0   0   0
"18/06/2023 3:51:57"  24  101 2026  87    0   1   8
"18/06/2023 3:51:58"  0   102 2045  44    43  42  41
"18/06/2023 3:51:59"  24  102 2043  33    0   1   8
"18/06/2023 3:52:01"  24  101 2022  89    1   4   76
"18/06/2023 3:52:03"  24  102 2046  31    0   1   6
"18/06/2023 3:52:18"  136 101 3333  99    0   1   87
"18/06/2023 3:52:54"  136 102 2045  66    2   3   4
"""), sep="\s+")

Is there a way to read first two and last two columns(one for ID=136 and one for ID=24) for every different C_1.

I am trying below code it's working as expected, looking for simple and fast solution-

filter_1 = dfs['ID'].isin(['136'])
filter_2 = dfs['ID'].isin(['24'])
test_df1 = dfs.loc[filter_1, :]
test_df2 = dfs.loc[filter_2, :]
g1 = test_df1.groupby('C_1')
g2 = test_df2.groupby('C_1')
final_df1 = pd.concat([g1.head(1), g1.tail(1)]).drop_duplicates().sort_values('C_1').reset_index(drop=True)
final_df2 = pd.concat([g2.head(1), g2.tail(1)]).drop_duplicates().sort_values('C_1').reset_index(drop=True)
#merge final_df1 & final_df2

Output -

      datetime        ID  C_1 C_2  C_3   C_4 C_5 C_6
"18/06/2023 3:51:50"  136 101 2024  89    4   3   13
"18/06/2023 3:51:53"  24  101 2029  65    0   0   0
"18/06/2023 3:52:01"  24  101 2022  89    1   4   76
"18/06/2023 3:52:18"  136 101 3333  99    0   1   87
"18/06/2023 3:51:53"  24  102 2022  89    0   0   0
"18/06/2023 3:51:54"  136 102 2045  66    2   3   4
"18/06/2023 3:52:03"  24  102 2046  31    0   1   6
"18/06/2023 3:52:54"  136 102 2045  66    2   3   4

1 Answer 1

2

You could use cumcount and boolean indexing:

N = 1 # number of rows to keep per ID/C_1
g = dfs.groupby(['ID', 'C_1'])
out = dfs[g.cumcount().lt(N) | g.cumcount(ascending=False).lt(N)]

If you also want to filter on the ID:

N = 1
g = dfs.groupby(['ID', 'C_1'])
m = dfs['ID'].isin(['24', '136'])
out = dfs[m & (g.cumcount().lt(N) | g.cumcount(ascending=False).lt(N))]

Output:

              datetime   ID  C_1   C_2  C_3  C_4  C_5  C_6
0   18/06/2023 3:51:50  136  101  2024   89    4    3   13
2   18/06/2023 3:51:53   24  101  2029   65    0    0    0
3   18/06/2023 3:51:53   24  102  2022   89    0    0    0
4   18/06/2023 3:51:54  136  102  2045   66    2    3    4
12  18/06/2023 3:52:01   24  101  2022   89    1    4   76
13  18/06/2023 3:52:03   24  102  2046   31    0    1    6
14  18/06/2023 3:52:18  136  101  3333   99    0    1   87
15  18/06/2023 3:52:54  136  102  2045   66    2    3    4

Intermediates:

              datetime   ID  C_1   C_2  C_3  C_4  C_5  C_6  g  g_desc   isin  selection
0   18/06/2023 3:51:50  136  101  2024   89    4    3   13  0       4   True       True
1   18/06/2023 3:51:52  136  101  2028   61    4    3   18  1       3   True      False
2   18/06/2023 3:51:53   24  101  2029   65    0    0    0  0       2   True       True
3   18/06/2023 3:51:53   24  102  2022   89    0    0    0  0       3   True       True
4   18/06/2023 3:51:54  136  102  2045   66    2    3    4  0       1   True       True
5   18/06/2023 3:51:55    0  101  2022   89    0    0    0  0       0  False      False
6   18/06/2023 3:51:56  136  101  2222   77    0    0    0  2       2   True      False
7   18/06/2023 3:51:56   24  102  2022   89    0    0    0  1       2   True      False
8   18/06/2023 3:51:57  136  101  2024   90    0    0    0  3       1   True      False
9   18/06/2023 3:51:57   24  101  2026   87    0    1    8  1       1   True      False
10  18/06/2023 3:51:58    0  102  2045   44   43   42   41  0       0  False      False
11  18/06/2023 3:51:59   24  102  2043   33    0    1    8  2       1   True      False
12  18/06/2023 3:52:01   24  101  2022   89    1    4   76  2       0   True       True
13  18/06/2023 3:52:03   24  102  2046   31    0    1    6  3       0   True       True
14  18/06/2023 3:52:18  136  101  3333   99    0    1   87  4       0   True       True
15  18/06/2023 3:52:54  136  102  2045   66    2    3    4  1       0   True       True
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the solution. It's working as expected.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.