I am getting the following error when using pyspark pandas:
PandasNotImplementedError: The method pd.Series.__iter__() is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead
Here is my code:
import pyspark.pandas as ps
df_mas=(spark.read.format("csv").option ("header", "true"). load (driver.config["OutputFiles"])
df=df_mas.pandas_api()
df["MAUS"] = nр.where(df.MAUS=="NHTT"),"MHINC", df.MAUS)
display (df)