2

I want to create 'target_start' column in python:

id start end diff target_start
12220 1999-11-22 2008-08-31 3515 1999-11-22
12220 2018-04-16 2019-09-15 1 2018-04-16
12220 2019-09-16 2019-11-30 1 2018-04-16
12220 2019-12-01 2020-03-31 1 2018-04-16
12220 2020-04-01 2020-06-30 -711 2018-04-16
11132 2018-07-20 2019-09-15 1 2018-07-20
11132 2019-09-16 2021-01-01 -44197 2018-07-20

This is easy to solve in Excel:

enter image description here

but I don't know, how can I do this in pyton: First target row is "1", then:

df.loc[df.index==0,'target_start']= df['start']

I tried this code, but doesn't worked:

import pandas as pd
df=pd.read_excel('./Shift.xlsx')

#if id != id.shift(1) then target_start = start
df.loc[df['id'] != df['id'].shift(1), 'target_start'] = df['start']

#elif: diff != 1 then target_start = start
df.loc[df['diff'].shift(1) != 1, 'target_start'] = df['start']

#else: target_start = target_start.shift(1)
df.loc[(df.index != 0) & (df['id'] == df['id'].shift(1)) & (df['diff'].shift(1) == 1), 'target_start']=df['target_start'].shift(1)

print(df)

The result is:

id start end diff target_start
12220 1999-11-22 2008-08-31 3515 1999-11-22
12220 2018-04-16 2019-09-15 1 2018-04-16
12220 2019-09-16 2019-11-30 1 2018-04-16
12220 2019-12-01 2020-03-31 1 NaT
12220 2020-04-01 2020-06-30 -711 NaT
11132 2018-07-20 2019-09-15 1 2018-07-20
11132 2019-09-16 2021-01-01 -44197 2018-07-20

Anyone know how to solve this? Thanks in advance!

3
  • What do you mean by "Does not work"? What is the actual output? Commented Jan 19, 2021 at 8:57
  • @LeoE: i write it to the post. The problem is with the NaT values in target_start column Commented Jan 19, 2021 at 9:06
  • Please accept it as an answer if it solved the problem. Thank you :) Commented Jan 19, 2021 at 10:17

2 Answers 2

1

Here is how I will implement your excel formula (which you highlighted):

df.start = pd.to_datetime(df.start)
df.end = pd.to_datetime(df.end)
df.target_start = pd.to_datetime(df.target_start)

df["id_shift"] = df.id.shift()

target_start = [df.iloc[0, 1]]

for i in range(1, df.shape[0]):
    print(i)
    if df.iloc[i, 0] != df.iloc[i - 1, 0]:
        target_start.append(df.iloc[i, 1])
    else:
        if df.iloc[i, 3] == 1:
            target_start.append(df.iloc[i, 1])
        else:
            target_start.append(target_start[i - 1])


df["target_start"] = target_start
del df["id_shift"]

It generates the following resutl:

id  start   end         diff                 target_start
0   12220   1999-11-22  2008-08-31  3515    1999-11-22
1   12220   2018-04-16  2019-09-15  1       2018-04-16
2   12220   2019-09-16  2019-11-30  1       2019-09-16
3   12220   2019-12-01  2020-03-31  1       2019-12-01
4   12220   2020-04-01  2020-06-30  -711    2019-12-01
5   11132   2018-07-20  2019-09-15  1       2018-07-20
6   11132   2019-09-16  2021-01-01  -44197  2018-07-20
Sign up to request clarification or add additional context in comments.

Comments

1

Thank you @quest! It is fantastic :)

I fixed one thing after first else:

        else:
            if df.iloc[i-1, 3] != 1:
                target_start.append(df.iloc[i, 1])

So the perfect code is:

df.start = pd.to_datetime(df.start)
df.end = pd.to_datetime(df.end)
df.target_start = pd.to_datetime(df.target_start)

df["id_shift"] = df.id.shift()

target_start = [df.iloc[0, 1]]

for i in range(1, df.shape[0]):
    #print(i)
    if df.iloc[i, 0] != df.iloc[i - 1, 0]:
        target_start.append(df.iloc[i, 1])
    else:
        if df.iloc[i-1, 3] != 1:
            target_start.append(df.iloc[i, 1])
        else:
            target_start.append(target_start[i - 1])


df["target_start"] = target_start
del df["id_shift"]
df.head(7)

Thanks again! You helped a lot.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.