Pandas on Spark API Date Operations

I am using Pandas in Spark API for some data preprocessing files which was initially in Pandas. I am seeing that the date operations are very slow and some are not compatible at all. For Eg: I cannot do this

df[time_col] + pd.Timedelta(1, unit='D')

instead, I had to write the below operation:

df[time_col ].apply(lambda x: x+timedelta(days=1))

Is there any other way I can use date_add operations? And why would pandas on Spark be slow under the hood?

I have tried the Pyspark code which has the interval operation and works fast.

edited Jun 11, 2024 at 5:59

Borisonekenobi

5294 silver badges19 bronze badges

asked Jun 10, 2024 at 17:58

Chaitanya Kulkarni

why you can't do df[time_col] + pd.Timedelta(1, unit='D')? Do you get wrong result? Do you get error message? You have to show it in question (not in comments). We can't run your code, we can't see your computer, and we can't read in your mind - you have to show all details in question. And it would be simpler if you would create minimal reproducible example with example data in code - so we could simply copy and run it.

furas
– furas

2024-06-11 01:02:41 +00:00
Commented Jun 11, 2024 at 1:02
in question you could also show "Pyspark code which has the interval operation" - so we could run it and compare time.

furas
– furas

2024-06-11 01:03:24 +00:00
Commented Jun 11, 2024 at 1:03
if Pyspark work faster then maybe you should convert Pandas DataFrame to Pyspark, make calculations and convert it back to DataFrame

furas
– furas

2024-06-11 01:06:49 +00:00
Commented Jun 11, 2024 at 1:06

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Pandas on Spark API Date Operations

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest