Skip to content

feat: provide an alternative big-star -- small-star implementation without limits on amount of edges #339

@seirl

Description

@seirl

Right now the ConnectedComponents algorithm has a hard limit on the number of edges as an implementation detail: https://github.com/graphframes/graphframes/blob/master/src/main/scala/org/graphframes/lib/ConnectedComponents.scala#L371-L374

At Software Heritage the graph on which we run this algorithm currently has >100B edges with an exponential growth. We think this 200B threshold might be crossed in 1-2 years.

Would it be possible to increase this limit so that the algorithm is able to handle larger graphs?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions