1

I have a dataset with the following structure:

ID drug drug sequence drup type
A X 1 AB
A X 1 AB
A Y 2 CD
A Z 3 CD
A V 4 EF
A V 4 EF
B W 1 GH
B W 1 GH
B W 2 GH
C Z 1 CD
C V 2 EF
C V 2 EF

I want to keep only patients who are treated with CD & EF and exclude all other patients. The final result should be:

ID drug drug sequence drup type
C Z 1 CD
C V 2 EF
C V 2 EF

Patient A was excluded because even though it has been treated with both CD and EF it had been previously treated with AB.

I have tried things like:

df_group = df.groupby("id").filter(lambda x : (x["drug type"]=="CD").all())

But it is not working properly. I'd appreciate any feedback.

3
  • Just for clarity, you would exclude patients treated with only DC or EF, right? Commented Aug 26, 2024 at 15:20
  • 1
    @mozway I would keep patients only treated with CD and EF Commented Aug 26, 2024 at 15:34
  • @ReumaptSPR please have a look at the answer I shared, I guess it should give you the desired result by keeping the patients with CD and EF. Commented Aug 26, 2024 at 15:48

3 Answers 3

1

You may try to group the data by ID, check the unique drug type for each patient and ensure it contains only CD and EF, filter the groups where this condition is met. Return the filtered dataframe.


import pandas as pd

# Sample Data
data = {
    'ID': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'drug': ['X', 'X', 'Y', 'Z', 'V', 'V', 'W', 'W', 'W', 'Z', 'V', 'V'],
    'drug sequence': [1, 1, 2, 3, 4, 4, 1, 1, 2, 1, 2, 2],
    'drug type': ['AB', 'AB', 'CD', 'CD', 'EF', 'EF', 'GH', 'GH', 'GH', 'CD', 'EF', 'EF']
}

df = pd.DataFrame(data)

# Filter the dataframe
def filter_patients(group):
    unique_drug_types = set(group['drug type'])
    return unique_drug_types == {'CD', 'EF'}

filtered_df = df.groupby('ID').filter(filter_patients)

# Display the result
print(filtered_df)

So, the result would be:

  ID drug  drug sequence drug type
9   C    Z              1        CD
10  C    V              2        EF
11  C    V              2        EF

Sign up to request clarification or add additional context in comments.

Comments

1

You can use a set equality with groupby.transform and boolean indexing:

out = df[
    df.groupby('ID')['drug type'].transform(lambda x: set(x) == {'CD', 'EF'})
]

Output:

   ID drug  drug sequence drug type
9   C    Z              1        CD
10  C    V              2        EF
11  C    V              2        EF

Intermediate:

df.groupby('ID')['drug type'].transform(lambda x: set(x) == {'CD', 'EF'})

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9      True
10     True
11     True
Name: drug type, dtype: bool

Comments

0

Something like that should work.

data = {'ID': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
        'drug': ['X', 'X', 'Y', 'Z', 'V', 'V', 'W', 'W', 'W', 'Z', 'V', 'V'],
        'drug sequence': [1, 1, 2, 3, 4, 4, 1, 1, 2, 1, 2, 2],
        'drug type': ['AB', 'AB', 'CD', 'CD', 'EF', 'EF', 'GH', 'GH', 'GH', 'CD', 'EF', 'EF']}

df = pd.DataFrame(data)

patient_drugs = df.groupby('ID')['drug type'].unique()
valid_patients = patient_drugs[patient_drugs.apply(lambda x: set(x) == {'CD', 'EF'})].index
filtered_df = df[df['ID'].isin(valid_patients)]

print(filtered_df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.