2

I am trying to create a session template of dates for a dataframe in pandas based on the start and end day of the week of my given dataframe. I have the start and end day abbreviations (Mo, Tu, We, etc.) and the start/end time (8:30 AM, 5:30 PM, etc.).

What I want to create is a template that gives the start day abbreviation, the times over the days that it spans, and the end day. For example, my dataframe currently looks like the following:

Start Time  End Time    Start/End            Namestart Nameend    Days           Session Template   
Mo 8:30 AM  Th 5:30 PM  Mo 8:30 AM-Th 5:30 PM    Mo      Th       4 Day       4 Day Mo 8:30 AM-Th 5:30 PM
We 8:30 AM  Fr 12:30 PM We 8:30 AM-Fr 12:30 PM   We      Fr       3 Day     3 Day We 8:30 AM-Fr 12:30 PM

The current session template gives me the day count, start time, end time, and the day of week it begins/ends on. However, I would like for it to give each individual day that the item spans. For the examples above it should yield:

4 Day Mo 8:30 AM-5:30 PM, Tu 8:30 AM-5:30 PM, We 8:30 AM-5:30 PM, Th 8:30 AM-5:30 PM. 
3 Day We 8:30 AM-5:30 PM, Th 8:30 AM-5:30 PM, Fr 8:30 AM-12:30 PM
4
  • If I understand your problem correctly, I would suggest removing the day abbreviations from the Start_time and End_time. So only use times in those columns and you already have the Namestart and Nameend. You can then loop over the days from Namestart to Nameend and for each day, print the Start_time and End_time. Commented Nov 18, 2019 at 15:57
  • And if you do not want to modify your data frame, you should think in the direction of substring extraction from those two columns, to get the time values like 8:30 AM and 5:30 PM. You will essentially reach a similar step as above but without modifying the data frame. It can be done elegantly using regex (regular expressions). Commented Nov 18, 2019 at 16:01
  • Yes this is what I am trying to solve. How would you loop over the days based on the day count? I will have different values in the 'Days' field (ranging from 1 day to 5 days) and do not want to hard code each individual value. Commented Nov 18, 2019 at 16:02
  • One way is to store a list like days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su'] and increment a counter after you print for one day. Another way would be to store a dictionary of (day, index) pairs, like so days = {'Mo':1, 'Tu':2, 'We':3, 'Th':4, 'Fr':5, 'Sa':6, 'Su':7] and then you can just use the difference between the corresponding indices. So between Tu and Fr, you have 5-2+1=4 days. You will need to take care of the edge cases. Commented Nov 18, 2019 at 16:08

1 Answer 1

1

Here is how you can do it:

import pandas as pd
import re
import itertools
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 1000)

df = pd.read_csv("data.csv")
print(df, "\n")

days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su']

for index, row in df.iterrows():
    # get the start and end days
    start_day = row['Namestart']
    end_day = row['Nameend']
    # get the start end end times
    start_time = re.findall(r'\s(\d+\:\d{2}\s?(?:AM|PM|am|pm))',
                           row['Start Time'])[0]
    end_time = re.findall(r'\s(\d+\:\d{2}\s?(?:AM|PM|am|pm))',
                         row['End Time'])[0]
    # get the indices corresponding to the start and end days
    start_index = days.index(start_day)
    end_index = days.index(end_day)+1
    # count the number of days
    cnt = end_index - start_index
    print(cnt, "days\t", end='')
    # slice the days list from start_index to end_index
    for day in itertools.islice(days, start_index, end_index):
        if (day!=end_day):
            print(day, start_time, "- 5:30 PM\t", end='')
        else:
            print(day, start_time, "-", end_time, end='')
    print() # to start a new line before printing each row

Output:

   Start Time     End Time               Start/End Namestart Nameend   Days              Session Template
0  Mo 8:30 AM   Th 5:30 PM   Mo 8:30 AM-Th 5:30 PM        Mo      Th  4 Day   4 Day Mo 8:30 AM-Th 5:30 PM
1  We 8:30 AM  Fr 12:30 PM  We 8:30 AM-Fr 12:30 PM        We      Fr  3 Day  3 Day We 8:30 AM-Fr 12:30 PM 

4 days  Mo 8:30 AM - 5:30 PM    Tu 8:30 AM - 5:30 PM    We 8:30 AM - 5:30 PM    Th 8:30 AM - 5:30 PM
3 days  We 8:30 AM - 5:30 PM    Th 8:30 AM - 5:30 PM    Fr 8:30 AM - 12:30 PM

The comments should explain the code. An explanation of the regular expressions I have used can be found in this answer - https://stackoverflow.com/a/49217300/6590393.

Also, please note that the above code is based on the assumption that you only move forward in the list. So for example, Sa-Mo, would not yield the expected result. I would leave it to you to handle the boundary cases if you need.

Sign up to request clarification or add additional context in comments.

4 Comments

This is great, thank you so much for your help. There is only one issue still - the end time is only applicable on the end date, so on the days in between the item would not end at 12:30 PM but instead 5:30 PM. For example, the second output should be: 3 Day We 8:30 AM-5:30 PM, Th 8:30 AM-5:30 PM, Fr 8:30 AM-12:30 PM. Is there a way to just hard code the end time for the days in between to be 5:30 PM?
Edited the code. Please accept the answer if it solved your problem.
How could I return the results as a new column say df['session template'] rather than printing the new lines? I tried: for day in itertools.islice(days, start_index, end_index): if (day!=end_day): print(day, start_time, "- 5:30 PM,\t", end='') else: print(day, start_time, "-", end_time, ",", end='') x['session temp'] = day + start_time + "-" + end_time + ","
You can add new columns to a Pandas data frame. Add a column called 'session temp' and then update the values in that column instead of/apart from printing. You'll need to change your x['session temp'] to df['session temp'].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.