Skip to content

bug: ConnectedComponents: error 'Unable to infer schema for Parquet. It must be specified manually' #201

@opsomerto

Description

@opsomerto

Hi,

When computing connectedComponentsusing the graphframes algorithm I get the following error:

File "/root/.ivy2/jars/graphframes_graphframes-0.5.0-spark2.1-s_2.11.jar/graphframes/graphframe.py", line 279, in connectedComponents
File "/usr/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/spark-2.1.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'

Looking at the code, I guess this is due to saving an empty checkpoint in parquet format. Because there are similar issues with spark when trying to load empty parquet files. So maybe more a spark issue, however in the meantime, checking if the ee dataframe is empty before saving it to parquet could help ?.

Thomas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions