I have a dataset with the following structure:
| ID | drug | drug sequence | drup type |
|---|---|---|---|
| A | X | 1 | AB |
| A | X | 1 | AB |
| A | Y | 2 | CD |
| A | Z | 3 | CD |
| A | V | 4 | EF |
| A | V | 4 | EF |
| B | W | 1 | GH |
| B | W | 1 | GH |
| B | W | 2 | GH |
| C | Z | 1 | CD |
| C | V | 2 | EF |
| C | V | 2 | EF |
I want to keep only patients who are treated with CD & EF and exclude all other patients. The final result should be:
| ID | drug | drug sequence | drup type |
|---|---|---|---|
| C | Z | 1 | CD |
| C | V | 2 | EF |
| C | V | 2 | EF |
Patient A was excluded because even though it has been treated with both CD and EF it had been previously treated with AB.
I have tried things like:
df_group = df.groupby("id").filter(lambda x : (x["drug type"]=="CD").all())
But it is not working properly. I'd appreciate any feedback.
CDandEF.