Drop rows based on conditions (longitudinal dataset)

Question

I have a dataset with the following structure:

ID	drug	drug sequence	drup type
A	X	1	AB
A	X	1	AB
A	Y	2	CD
A	Z	3	CD
A	V	4	EF
A	V	4	EF
B	W	1	GH
B	W	1	GH
B	W	2	GH
C	Z	1	CD
C	V	2	EF
C	V	2	EF

I want to keep only patients who are treated with CD & EF and exclude all other patients. The final result should be:

ID	drug	drug sequence	drup type
C	Z	1	CD
C	V	2	EF
C	V	2	EF

Patient A was excluded because even though it has been treated with both CD and EF it had been previously treated with AB.

I have tried things like:

df_group = df.groupby("id").filter(lambda x : (x["drug type"]=="CD").all())

But it is not working properly. I'd appreciate any feedback.

Just for clarity, you would exclude patients treated with only DC or EF, right? — mozway
– mozway, Commented Aug 26, 2024 at 15:20
@ReumaptSPR please have a look at the answer I shared, I guess it should give you the desired result by keeping the patients with CD and EF. — mrconcerned
– mrconcerned, Commented Aug 26, 2024 at 15:48

mrconcerned · Accepted Answer · 2024-08-26 15:11:27Z

You may try to group the data by ID, check the unique drug type for each patient and ensure it contains only CD and EF, filter the groups where this condition is met. Return the filtered dataframe.


import pandas as pd

# Sample Data
data = {
    'ID': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'drug': ['X', 'X', 'Y', 'Z', 'V', 'V', 'W', 'W', 'W', 'Z', 'V', 'V'],
    'drug sequence': [1, 1, 2, 3, 4, 4, 1, 1, 2, 1, 2, 2],
    'drug type': ['AB', 'AB', 'CD', 'CD', 'EF', 'EF', 'GH', 'GH', 'GH', 'CD', 'EF', 'EF']
}

df = pd.DataFrame(data)

# Filter the dataframe
def filter_patients(group):
    unique_drug_types = set(group['drug type'])
    return unique_drug_types == {'CD', 'EF'}

filtered_df = df.groupby('ID').filter(filter_patients)

# Display the result
print(filtered_df)

So, the result would be:

  ID drug  drug sequence drug type
9   C    Z              1        CD
10  C    V              2        EF
11  C    V              2        EF

mozway · Accepted Answer · 2024-08-26 15:17:00Z

1

You can use a set equality with groupby.transform and boolean indexing:

out = df[
    df.groupby('ID')['drug type'].transform(lambda x: set(x) == {'CD', 'EF'})
]

Output:

   ID drug  drug sequence drug type
9   C    Z              1        CD
10  C    V              2        EF
11  C    V              2        EF

Intermediate:

df.groupby('ID')['drug type'].transform(lambda x: set(x) == {'CD', 'EF'})

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9      True
10     True
11     True
Name: drug type, dtype: bool

answered Aug 26, 2024 at 15:17

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

Markus Wanx · Accepted Answer · 2024-08-26 15:14:27Z

0

Something like that should work.

data = {'ID': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
        'drug': ['X', 'X', 'Y', 'Z', 'V', 'V', 'W', 'W', 'W', 'Z', 'V', 'V'],
        'drug sequence': [1, 1, 2, 3, 4, 4, 1, 1, 2, 1, 2, 2],
        'drug type': ['AB', 'AB', 'CD', 'CD', 'EF', 'EF', 'GH', 'GH', 'GH', 'CD', 'EF', 'EF']}

df = pd.DataFrame(data)

patient_drugs = df.groupby('ID')['drug type'].unique()
valid_patients = patient_drugs[patient_drugs.apply(lambda x: set(x) == {'CD', 'EF'})].index
filtered_df = df[df['ID'].isin(valid_patients)]

print(filtered_df)

answered Aug 26, 2024 at 15:14

Markus Wanx

745 bronze badges

Collectives™ on Stack Overflow

Drop rows based on conditions (longitudinal dataset)

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

ID	drug	drug sequence	drup type
A	X	1	AB
A	X	1	AB
A	Y	2	CD
A	Z	3	CD
A	V	4	EF
A	V	4	EF
B	W	1	GH
B	W	1	GH
B	W	2	GH
C	Z	1	CD
C	V	2	EF
C	V	2	EF

ID	drug	drug sequence	drup type
A	X	1	AB
A	X	1	AB
A	Y	2	CD
A	Z	3	CD
A	V	4	EF
A	V	4	EF
B	W	1	GH
B	W	1	GH
B	W	2	GH
C	Z	1	CD
C	V	2	EF
C	V	2	EF

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

ID	drug	drug sequence	drup type
A	X	1	AB
A	X	1	AB
A	Y	2	CD
A	Z	3	CD
A	V	4	EF
A	V	4	EF
B	W	1	GH
B	W	1	GH
B	W	2	GH
C	Z	1	CD
C	V	2	EF
C	V	2	EF