benchmarks

GraphFrames Benchmarks

JMH-based benchmarks for GraphFrames algorithms using LDBC Graphalytics datasets.

Available Benchmarks

Benchmark	Description
`ShortestPathsBenchmark`	Shortest paths from a source vertex
`ConnectedComponentsBenchmark`	Connected components detection
`LabelPropagationBenchmark`	Community detection via label propagation

Running Benchmarks

Basic Usage

sbt "benchmarks/jmh:run -i 3 -wi 1 -f 1 -p graphName=wiki-Talk org.graphframes.benchmarks.ShortestPathsBenchmark"

Parameters

Parameter	Values	Description
`graphName`	See Available Graphs	LDBC graph dataset to use
`algorithm`	`graphframes`, `graphx`	Algorithm implementation
`maxIter`	integer (default: 10)	Max iterations (iterative algorithms only, e.g. `LabelPropagationBenchmark`)
`useLocalCheckpoints`	`true`, `false` (default: `true`)	Use local checkpoints instead of regular checkpoints; faster but less reliable (not applicable to `graphx` algorithm)
`broadcastThreshold`	integer (default: 1000000, or `-1` for AQE)	Max vertex degree for join-based propagation; above this threshold a broadcast is used (`ConnectedComponentsBenchmark` only)
`checkPointInterval`	integer (default: 1)	Number of iterations between checkpoints (`ShortestPathsBenchmark` only, `graphframes` algorithm)
`startingVertex`	long integer (default: 1)	Source vertex ID for shortest paths computation (`ShortestPathsBenchmark` only)

Examples

Run all algorithms on wiki-Talk:

sbt "benchmarks/jmh:run -p graphName=wiki-Talk org.graphframes.benchmarks.ShortestPathsBenchmark"

Run only GraphX implementation:

sbt "benchmarks/jmh:run -p algorithm=graphx -p graphName=cit-Patents org.graphframes.benchmarks.ConnectedComponentsBenchmark"

Run with custom iteration count:

sbt "benchmarks/jmh:run -p maxIter=5 -p graphName=wiki-Talk org.graphframes.benchmarks.LabelPropagationBenchmark"

Available Graphs

Graph	Vertices	Edges	Weighted
`wiki-Talk`	~2.4M	~5.0M	No
`cit-Patents`	~3.8M	~16.5M	No
`kgs`	~800K	~23.5M	Yes
`graph500-22`	~4.2M	~67M	No
`graph500-23`	~8.4M	~134M	No
`graph500-24`	~16.8M	~268M	No
`graph500-25`	~33.6M	~537M	No

Test graphs (small, for validation):

test-bfs-directed, test-bfs-undirected
test-cdlp-directed, test-cdlp-undirected
test-pr-directed, test-pr-undirected
test-wcc-directed, test-wcc-undirected

Data Format

Benchmarks use LDBC Graphalytics parquet format:

Vertices: {graphName}-v.parquet with column id
Edges: {graphName}-e.parquet with columns src, dst (and weight for weighted graphs)

Data is downloaded on first use and cached in target/ldbc-cache/.

Notes

Benchmarks require ~10GB heap memory (configured automatically)
Data files are cached locally to avoid repeated downloads
The old LDBCBenchmarkSuite is deprecated; use the new split benchmarks

Name		Name	Last commit message	Last commit date
parent directory ..
src/main/scala/org/graphframes/benchmarks		src/main/scala/org/graphframes/benchmarks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

GraphFrames Benchmarks

Available Benchmarks

Running Benchmarks

Basic Usage

Parameters

Examples

Available Graphs

Data Format

Notes

FilesExpand file tree

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

GraphFrames Benchmarks

Available Benchmarks

Running Benchmarks

Basic Usage

Parameters

Examples

Available Graphs

Data Format

Notes