1

I have a sample data that looks like this.

Column DateDuration was calculated in Excel, following below logic:

  • DateDuration between SecondDate and FirstDate >= 28, then DateDuration = SecondDate - FirstDate.
  • DateDuration between SecondDate and FirstDate <28, if ThirdDate = nan, then DateDuration = SecondDate - FirstDate.
  • DateDuration between SecondDate and FirstDate <28, if ThirdDate = not nan, then consider (ThirdDate - FirstDate):
    • ThirdDate - FirstDate >= 28, then DateDuration = ThirdDate - FirstDate.
    • ThirdDate - FirstDate < 28, if FourthDate = nan, then DateDuration = ThirdDate - FirstDate.
    • ThirdDate - FirstDate < 28, if FourthDate = not nan, then DateDuration = FourthDate - FirstDate.

I would like to calculate DateDuration in Python but do not know how to do about this.

Types of data in Python:

  • ID int64
  • FirstDate object
  • SecondDate object
  • ThirdDate object
  • FourthDate object

I am new to Python. Any help would be greatly appreciated!!

import pandas as pd
import numpy as np
df['FirstDate'] = pd.to_datetime(df['FirstDate'])
df['SecondDate'] = pd.to_datetime(df['SecondDate'])
df['DayDifference2'] = (df['SecondDate']) -(df['FirstDate'])
df['DayDifference3'] = (df['ThirdDate']) -(df['FirstDate'])
df['DayDifference4'] = (df['FourthDate']) -(df['FirstDate'])
    
x = df['DayDifference2'].dt.days
y = df['DayDifference3'].dt.days
z = df['DayDifference4'].dt.days

condlist = [x<28, x>=28]
choicelist = [(df['ThirdDate']) -(df['FirstDate']), (df['SecondDate']) -(df['FirstDate'])]
np.select(condlist, choicelist)

My data:

ID FirstDate SecondDate ThirdDate FourthDate DateDuration
2914300 2021-09-23 2021-10-07 2021-11-29 2021-12-20 67
3893461 2021-09-08 2021-10-06 2022-04-07 211
4343075 2021-06-23 2021-09-27 96
4347772 2021-06-23 2021-09-27 96
4551963 2021-08-02 2021-10-14 2022-03-11 73
4893324 2021-09-30 2021-10-01 2022-03-03 2022-03-10 154
5239991 2021-06-24 2021-08-26 2021-09-25 2022-02-03 63
8454947 2021-09-28 2021-10-05 7
8581390 2021-09-27 2022-03-21 2022-03-25 175
8763766 2021-09-20 2021-10-04 2021-12-09 80
9144185 2021-06-18 2021-06-23 5
9967685 2021-09-13 2021-10-29 2022-02-07 2022-03-23 46
11367560 2021-08-31 2021-09-28 2021-10-21 2022-02-11 51
3
  • Sorry I did try including my data as a table but it did not work. I had to put data in the code!! Commented Jul 13, 2022 at 2:12
  • Do print(df) and copy/paste the output. Commented Jul 13, 2022 at 2:23
  • I tried doing that but it did not work either. Thanks anw! Commented Jul 13, 2022 at 20:40

2 Answers 2

0
import pandas as pd
import numpy as np

df = pd.read_csv('date_example.csv')
df.loc[:,'FirstDate':'FourthDate'] = df.loc[:,'FirstDate':'FourthDate'].astype('datetime64[ns]')
df

enter image description here NaT is a missing value of datetime64[ns] type




Conditions & Choices

conditions = [
    (df['SecondDate'] - df['FirstDate']).dt.days >= 28,
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].isna(),
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days >= 28),
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days < 28) & df['FourthDate'].isna(),
    ((df['SecondDate'] - df['FirstDate']).dt.days < 28) & df['ThirdDate'].notna() & ((df['ThirdDate'] - df['FirstDate']).dt.days < 28) & df['FourthDate'].notna()
]

choices = [
    (df['SecondDate'] - df['FirstDate']).dt.days,
    (df['SecondDate'] - df['FirstDate']).dt.days,
    (df['ThirdDate'] - df['FirstDate']).dt.days,
    (df['ThirdDate'] - df['FirstDate']).dt.days,
    (df['FourthDate'] - df['FirstDate']).dt.days
]



df['Duration'] = np.select(conditions, choices)
df

Result enter image description here



Discussion: There are some differences, e.g., second row, ID = 3893461, according to your conditions(DateDuration between SecondDate and FirstDate >= 28, then DateDuration = SecondDate - FirstDate.), SecondDate - FirstDate of ID = 3893461 is 28, same thing happened on last row, ID = 11367560

Sign up to request clarification or add additional context in comments.

1 Comment

Sorry, my bad. My result was based on DateDuration >28 or <=28. Your code works perfectly. Thanks Baron!
0

Refer to the time module built-in. It allows for more time class types that I actually used for my own workout routine maker.

import datetime as dt

# particularly the types:
dt.timedelta(1)
# and
dt.time(minute=0, second=0)
# there are also date classes you can use. 

Documentation: https://docs.python.org/3/library/datetime.html

1 Comment

Thanks for sharing this documentation!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.