GitHub - ShreeChaturvedi/fast-mnist-nn: A very fast NN implementation on the MNIST dataset with custom training, evaluation, and matrix multiplication.

High-performance C++ neural network for MNIST digit recognition with SIMD kernels, OpenMP, and reproducible benchmarks.

Highlights

SIMD-accelerated matrix ops (AVX2/AVX-512/NEON) with aligned storage.
OpenMP-aware hot paths for dot, transpose, and axpy.
P2 PGM parser with in-memory + on-disk cache for repeat runs.
CLI training + evaluation pipeline with configurable epochs.
Catch2 tests wired to CTest.
Google Benchmark suite with published results + charts.
Doxygen docs target and clang-format config.
CI on Linux/macOS/Windows via GitHub Actions.

Quickstart

python3 tools/run.py

This downloads MNIST, builds the project, and runs a training pass. Use python3 tools/run.py --help for flags.

Benchmarks

Run files:

docs/benchmarks/runs/bench-20251226-154121-baseline.json
docs/benchmarks/runs/bench-20251226-154121-native.json
docs/benchmarks/runs/bench-20251226-154121-openmp-native.json

Configs:

baseline: OpenMP off, native off
native: OpenMP off, native on
openmp+native: OpenMP on, native on

Environment:

Apple M2, macOS 15.5
Apple clang 17.0.0
Release (-O3, OpenMP on/off, -march=native on/off)

Matrix ops (ns/op, lower is better):

Case	baseline	native	openmp+native
dot 32	`6165`	`6229`	`6287`
dot 64	`65252`	`57222`	`89130`
dot 128	`575281`	`587767`	`374400`
dot 256	`4835360`	`4759132`	`1379835`
transpose 128	`5441`	`5292`	`23662`
transpose 256	`23098`	`22104`	`31108`
transpose 512	`198735`	`178676`	`87914`
transpose 1024	`978383`	`861078`	`502426`
axpy 128	`3486`	`3477`	`23917`
axpy 256	`13886`	`13896`	`26335`
axpy 512	`55848`	`55441`	`35846`
axpy 1024	`230626`	`229230`	`114910`

Training/inference throughput (img/s, higher is better):

Case	baseline	native	openmp+native
learn step	`48755`	`49399`	`48636`
classify	`81628`	`80712`	`69994`

OpenMP overhead shows up on smaller sizes; the line charts illustrate where parallelism starts to pay off.

See docs/benchmarks/benchmarks.md for methodology and scripts.

Run Benchmarks

python3 tools/run_benchmarks.py --openmp --native

Build and Test

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
ctest --test-dir build

macOS quickstart:

./tools/bootstrap_macos.sh

Run

./build/fast_mnist_cli data 5000 10 TrainingSetList.txt TestingSetList.txt

Formatting

clang-format -i src/*.cpp include/fast_mnist/*.h apps/*.cpp

Documentation

cmake -S . -B build -DFAST_MNIST_ENABLE_DOXYGEN=ON
cmake --build build --target docs

Data

python3 tools/prepare_mnist.py --output data --list-dir .

The script auto-installs tqdm for progress bars; pass --no-auto-install to skip that step.

License

MIT -- see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
apps		apps
benchmarks		benchmarks
docs		docs
include/fast_mnist		include/fast_mnist
src		src
tests		tests
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CMakeLists.txt		CMakeLists.txt
Doxyfile		Doxyfile
LICENSE		LICENSE
README.md		README.md
python3.bat		python3.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Highlights

Quickstart

Benchmarks

Run Benchmarks

Build and Test

Run

Formatting

Documentation

Data

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Highlights

Quickstart

Benchmarks

Run Benchmarks

Build and Test

Run

Formatting

Documentation

Data

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages