Skip to content

Add portable CUDA 11.8 Docker environment for SLAM-LLM#248

Open
ak4off wants to merge 1 commit into
X-LANCE:mainfrom
ak4off:add-portable-docker-setup-fairseq-hubert
Open

Add portable CUDA 11.8 Docker environment for SLAM-LLM#248
ak4off wants to merge 1 commit into
X-LANCE:mainfrom
ak4off:add-portable-docker-setup-fairseq-hubert

Conversation

@ak4off
Copy link
Copy Markdown

@ak4off ak4off commented May 9, 2026

Summary

Adds a portable and reproducible CUDA 11.8 Docker environment for SLAM-LLM research workflows.

This setup is designed to reduce environment-related failures across institutional GPU servers and research clusters.

Includes

  • CUDA 11.8 devel base image
  • PyTorch 2.1.0 cu118
  • Fairseq pinned commit
  • DeepSpeed 0.14.5
  • Transformers pinned commit
  • PEFT pinned commit
  • nvcc + CUDA build toolchain
  • Multi-GPU training support
  • Docker run helper script
  • Documentation for setup and verification

Motivation

Many users encounter:

  • missing CUDA toolkit (nvcc)
  • Fairseq build failures
  • DeepSpeed compilation issues
  • CUDA/PyTorch mismatch problems
  • inconsistent multi-server environments
  • lack of sudo access on institutional servers

This Docker setup aims to provide a reproducible and portable environment across servers.

Notes

The image still requires:

  • working NVIDIA host drivers
  • Docker GPU runtime (--gpus all)
  • NVIDIA Container Toolkit

GPU kernel drivers cannot be packaged inside Docker.

@ak4off
Copy link
Copy Markdown
Author

ak4off commented May 9, 2026

Tested on multi-GPU ASR finetuning workflows with CUDA 11.8 and DeepSpeed 0.14.5.

This setup was created to improve reproducibility across institutional GPU servers where sudo access and CUDA toolkit installation are restricted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant