Skip to content

bug: Connected Components gives wrong results #453

@wisundstrom

Description

@wisundstrom

TL;DR
I ran across a bug in spark 3.5 and graphframes 0.8.3 where when using connected components, if spark.sql.adaptive.enabled is not false, the results returned will be incorrect, with many edges being seemingly ignored when finding paths.

 
I've been working on a project that is unfortunately in an air-gapped environment, so I can't share code, but I'll try to provide whatever information I can.

We have been using Graphframes connected components for a few years, and we recently migrated to spark 3.5.0 and graphframes 0.8.3.
We also migrated from a YARN cluster with hdfs to a k8s cluster with MinIO object storage.

For us, both connected components and BFS are returning results that look like they are not using edges that are present in the edge set.

If we set spark.sql.adaptive.enabled = false, then the results appear to be calculated as we would have expected.

Mostly wanted to put this here in case anyone else is pulling their hair out over this one, but if anyone has ideas about what could be causing this it would be good to fix, silent errors like this can really be nasty.

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions