-
Notifications
You must be signed in to change notification settings - Fork 266
feat: remove skewedJoin from ConnectedComponents #688
Copy link
Copy link
Closed
Labels
Description
Is your feature request related to a problem? Please describe.
- it is calling
collecton each iteration - it may break AQE
- AQE should handle it by itself
- threshold is so big, that it rarely reached
- especially important for the data-deduplication task where components are small
Describe the solution you would like
- remove it at all
Component
- Scala Core Internal
- Scala API
- Spark Connect Plugin
- Infrastructure
- PySpark Classic
- PySpark Connect
Additional context
It may add additional 10-15% to the performance that is huge on billions of edges
Are you planning on creating a PR?
- I'm willing to make a pull-request
Reactions are currently unavailable