Lightweight TEE-based system for secure ONNX model inference using Intel SGX with automated model partitioning to fit within enclave memory constraints.
-
Updated
Oct 16, 2025 - C
Lightweight TEE-based system for secure ONNX model inference using Intel SGX with automated model partitioning to fit within enclave memory constraints.
Hunyuan3D-2 fork — image→textured 3D→sliced STL + part segmentation. RTX 50-series (Blackwell/sm_120), CUDA 13.0, Python 3.12, PyTorch 2.11+cu130.
Web-Based Distributed LLM Inference
Distributed layer inference for transformer LLMs on edge K3s clusters, with Python/PyTorch and native C++/llama.cpp runtimes, GGUF stage shards, Kubernetes manifests, and an Ops UI for monitoring experiments.
Add a description, image, and links to the model-partitioning topic page so that developers can more easily learn about it.
To associate your repository with the model-partitioning topic, visit your repo's landing page and select "manage topics."