feat: aggregate nbrs api#792
Conversation
core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala
Outdated
Show resolved
Hide resolved
|
Hi @james-willis ! I addressed most of your comments. For some I left my answers. It looks like we are close to a final version. Could be nice if you can take another look, so I can work on bindings and docs. |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #792 +/- ##
==========================================
- Coverage 84.94% 80.80% -4.15%
==========================================
Files 68 78 +10
Lines 3507 4428 +921
Branches 453 527 +74
==========================================
+ Hits 2979 3578 +599
- Misses 528 850 +322 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I will work on tests, bindings and docs. |
|
@james-willis this one is ready for review |
|
Looking at this PR, here are my recommendations: Code Quality & Performance
API Design
Testing & Documentation
Implementation Robustness
Overall, this is a solid implementation that follows GraphFrames patterns well. The main areas for improvement are around performance optimization configuration and edge case handling. The comprehensive test suite and documentation are excellent. Comment by Claude (AI Assistant) |
|
yeah that was it. sgtm. All other answer sgtm on that Claude review. |
…andom walks, and PG changes Resolved merge conflicts and integrated updates from graphframes/main into branch 785-aggregate-nbrs. Key changes: - Added sampling and convolution primitives (KMinSampling, SamplingConvolution) and related tests. - Introduced embeddings and random-walk features: Hash2Vec, RandomWalkEmbeddings, RandomWalkBase, RandomWalkWithRestart, and example runners. - Added new library components and algorithms: TwoPhase, Updated ConnectedComponents/KCore/Pregel/RandomizedContraction logic. - Added benchmarks and benchmark infrastructure for various algorithms. - Python API improvements: new property-graph package (python/graphframes/pg), internal utilities, updated client and protobuf bindings. - Build, docs, and packaging updates: build.sbt, docs (including graph-ml page), NOTICE, AGENTS.md, and pre-commit config. - Updated Spark shims and connect utilities to support the new features. - Added and updated tests across Scala and Python to cover the new functionality. All conflicts were fixed and changes staged for commit.
There was a problem hiding this comment.
Pull request overview
Adds a new multi-hop traversal API (“AggregateNeighbors”) to GraphFrames, implemented in Scala core with Spark Connect support and exposed via Python (classic + connect), along with documentation and tests to close #785.
Changes:
- Introduces
org.graphframes.lib.AggregateNeighbors(Scala) and wires it intoGraphFrame.aggregateNeighbors. - Extends Spark Connect protocol + server/client plumbing to invoke AggregateNeighbors remotely.
- Adds Python API wrappers (plus helper for attribute references), docs, and unit tests.
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala |
New Scala implementation + builder-style API and attribute helpers (srcAttr/dstAttr/edgeAttr). |
core/src/main/scala/org/graphframes/GraphFrame.scala |
Exposes aggregateNeighbors entrypoint on GraphFrame. |
core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala |
Scala unit tests for traversal behavior (paths, filters, stopping, etc.). |
connect/src/main/protobuf/graphframes.proto |
Adds AggregateNeighbors message and oneof method field. |
connect/src/main/scala/.../GraphFramesConnectUtils.scala |
Server-side Connect planner support for AggregateNeighbors. |
python/graphframes/connect/graphframes_client.py |
Python Spark Connect client method building the protobuf plan. |
python/graphframes/connect/proto/graphframes_pb2.py / .pyi |
Regenerated protobuf bindings including AggregateNeighbors message. |
python/graphframes/graphframe.py |
Public Python GraphFrame.aggregate_neighbors + AggregateNeighbors helper class; docstring for API. |
python/graphframes/classic/graphframe.py |
Classic (Py4J) implementation of aggregate_neighbors calling JVM builder. |
python/tests/test_graphframes.py |
Adds Python tests for basic AggregateNeighbors scenarios. |
docs/src/04-user-guide/05-traversals.md |
User guide section documenting Aggregate Neighbors with examples. |
project/LaikaCustoms.scala |
Doc site navigation tweak (page navigation depth). |
build.sbt |
Adjusts protoc version selection for Spark 3 builds. |
python/dev/build_jar.py |
Updates default Spark version used by the dev jar build script. |
.gitignore |
Ignores additional AI-tool related files/directories. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What changes were proposed in this pull request?
New API as described in #785
Why are the changes needed?
Close #785