0
from pyspark.sql import SparkSession 
from pyspark import SparkContext, SparkConf 
from pyspark.storagelevel import StorageLevel 
spark = SparkSession.builder.appName('TEST').config('spark.ui.port','4098').enableHiveSupport().getOrCreate()

df4 = spark.sql('
select * from hive_schema.table_name limit 1') print("query completed " )
 
df4.unpersist() 

df4.count()

df4.show()

I have executed above code to clear the dataframe and release the memory. However, df4.show() still works and shows the data.

Could you please help me with right method to free memory occupied by a spark DF please ?

1
  • unpersist() just means that Spark can remove the data from cache if it needs to where as unpersist(True) would mean Spark must remove data before proceeding. Commented Nov 8, 2023 at 17:14

1 Answer 1

1

the function unpersist() would just let Spark know that it can remove the data if it wants to rather than a hard push to clean up the data. Just updating the function to pass Boolean True would let Spark know that it must remove the data from the Cache before proceeding.

df4.unpersist(True) 

Reference : https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.unpersist.html

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Anand for taking time and responding. I have tried as advised , but still show() method displays the data. df4 = spark.sql(' \ select * from hive_schema.table_name limit 1') print("query completed " ) df4.count() df4.unpersist(True) df4.count() print(df4.count()) df4.show()
df4. show() would trigger a compute operation, if you want to monitor the memory look at Spark UI to track it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.