I want to create 'target_start' column in python:
| id | start | end | diff | target_start |
|---|---|---|---|---|
| 12220 | 1999-11-22 | 2008-08-31 | 3515 | 1999-11-22 |
| 12220 | 2018-04-16 | 2019-09-15 | 1 | 2018-04-16 |
| 12220 | 2019-09-16 | 2019-11-30 | 1 | 2018-04-16 |
| 12220 | 2019-12-01 | 2020-03-31 | 1 | 2018-04-16 |
| 12220 | 2020-04-01 | 2020-06-30 | -711 | 2018-04-16 |
| 11132 | 2018-07-20 | 2019-09-15 | 1 | 2018-07-20 |
| 11132 | 2019-09-16 | 2021-01-01 | -44197 | 2018-07-20 |
This is easy to solve in Excel:
but I don't know, how can I do this in pyton: First target row is "1", then:
df.loc[df.index==0,'target_start']= df['start']
I tried this code, but doesn't worked:
import pandas as pd
df=pd.read_excel('./Shift.xlsx')
#if id != id.shift(1) then target_start = start
df.loc[df['id'] != df['id'].shift(1), 'target_start'] = df['start']
#elif: diff != 1 then target_start = start
df.loc[df['diff'].shift(1) != 1, 'target_start'] = df['start']
#else: target_start = target_start.shift(1)
df.loc[(df.index != 0) & (df['id'] == df['id'].shift(1)) & (df['diff'].shift(1) == 1), 'target_start']=df['target_start'].shift(1)
print(df)
The result is:
| id | start | end | diff | target_start |
|---|---|---|---|---|
| 12220 | 1999-11-22 | 2008-08-31 | 3515 | 1999-11-22 |
| 12220 | 2018-04-16 | 2019-09-15 | 1 | 2018-04-16 |
| 12220 | 2019-09-16 | 2019-11-30 | 1 | 2018-04-16 |
| 12220 | 2019-12-01 | 2020-03-31 | 1 | NaT |
| 12220 | 2020-04-01 | 2020-06-30 | -711 | NaT |
| 11132 | 2018-07-20 | 2019-09-15 | 1 | 2018-07-20 |
| 11132 | 2019-09-16 | 2021-01-01 | -44197 | 2018-07-20 |
Anyone know how to solve this? Thanks in advance!
