JMH-based benchmarks for GraphFrames algorithms using LDBC Graphalytics datasets.
| Benchmark | Description |
|---|---|
ShortestPathsBenchmark |
Shortest paths from a source vertex |
ConnectedComponentsBenchmark |
Connected components detection |
LabelPropagationBenchmark |
Community detection via label propagation |
sbt "benchmarks/jmh:run -i 3 -wi 1 -f 1 -p graphName=wiki-Talk org.graphframes.benchmarks.ShortestPathsBenchmark"| Parameter | Values | Description |
|---|---|---|
graphName |
See Available Graphs | LDBC graph dataset to use |
algorithm |
graphframes, graphx |
Algorithm implementation |
maxIter |
integer (default: 10) | Max iterations (iterative algorithms only, e.g. LabelPropagationBenchmark) |
useLocalCheckpoints |
true, false (default: true) |
Use local checkpoints instead of regular checkpoints; faster but less reliable (not applicable to graphx algorithm) |
broadcastThreshold |
integer (default: 1000000, or -1 for AQE) |
Max vertex degree for join-based propagation; above this threshold a broadcast is used (ConnectedComponentsBenchmark only) |
checkPointInterval |
integer (default: 1) | Number of iterations between checkpoints (ShortestPathsBenchmark only, graphframes algorithm) |
startingVertex |
long integer (default: 1) | Source vertex ID for shortest paths computation (ShortestPathsBenchmark only) |
Run all algorithms on wiki-Talk:
sbt "benchmarks/jmh:run -p graphName=wiki-Talk org.graphframes.benchmarks.ShortestPathsBenchmark"Run only GraphX implementation:
sbt "benchmarks/jmh:run -p algorithm=graphx -p graphName=cit-Patents org.graphframes.benchmarks.ConnectedComponentsBenchmark"Run with custom iteration count:
sbt "benchmarks/jmh:run -p maxIter=5 -p graphName=wiki-Talk org.graphframes.benchmarks.LabelPropagationBenchmark"| Graph | Vertices | Edges | Weighted |
|---|---|---|---|
wiki-Talk |
~2.4M | ~5.0M | No |
cit-Patents |
~3.8M | ~16.5M | No |
kgs |
~800K | ~23.5M | Yes |
graph500-22 |
~4.2M | ~67M | No |
graph500-23 |
~8.4M | ~134M | No |
graph500-24 |
~16.8M | ~268M | No |
graph500-25 |
~33.6M | ~537M | No |
Test graphs (small, for validation):
test-bfs-directed,test-bfs-undirectedtest-cdlp-directed,test-cdlp-undirectedtest-pr-directed,test-pr-undirectedtest-wcc-directed,test-wcc-undirected
Benchmarks use LDBC Graphalytics parquet format:
- Vertices:
{graphName}-v.parquetwith columnid - Edges:
{graphName}-e.parquetwith columnssrc,dst(andweightfor weighted graphs)
Data is downloaded on first use and cached in target/ldbc-cache/.
- Benchmarks require ~10GB heap memory (configured automatically)
- Data files are cached locally to avoid repeated downloads
- The old
LDBCBenchmarkSuiteis deprecated; use the new split benchmarks