fix(candle-kernels): disable BF16 WMMA kernels for pre-Ampere GPUs by DrJesseGlass · Pull Request #3349 · huggingface/candle

DrJesseGlass · 2026-01-30T16:57:31Z

BF16 Tensor Core operations via WMMA require Ampere (sm_80) or newer. Pre-Ampere GPUs cannot compile BF16 WMMA fragment types, causing build failures on GTX 16xx and RTX 20xx cards.

So this adds a detect compute capability in build.rs via nvidia-smi and defines NO_BF16_WMMA for compute cap < 80.
Then places guard BF16 instantiations in moe_wmma.cu and moe_wmma_gguf.cu and emits an error message if BF16 is requested at runtime on unsupported hardware.
This way the FP16 WMMA and non-WMMA paths remain available.

Testing:

Builds and runs on DGX Spark (sm_100, BF16 WMMA enabled)
Hopefully GTX 16xx or RTX 20xx users in #3331 can verify.

OhashiReon · 2026-01-31T01:08:01Z

Thank you for the fix!

I tried building with this, but it still fails in my GTX 1650 Ti.

Here is the build log

Details

error: failed to run custom build command for `candle-kernels v0.9.2 (https://github.com/DrJesseGlass/candle.git?branch=fix%2Fdisable-bf16-wmma-pre-ampere#9dfdf72b)`

Caused by:
  process didn't exit successfully: `C:\Users\ter\Desktop\candle-test\target\debug\build\candle-kernels-1d80be0eee367860\build-script-build` (exit code: 101)
  --- stdout
  cargo::rerun-if-changed=build.rs
  cargo::rerun-if-changed=src/compatibility.cuh
  cargo::rerun-if-changed=src/cuda_utils.cuh
  cargo::rerun-if-changed=src/binary_op_macros.cuh
  cargo:warning=Detected compute cap: 75
  cargo:info=["/usr", "/usr/local/cuda", "/opt/cuda", "/usr/lib/cuda", "C:/Program Files/NVIDIA GPU Computing Toolkit", "C:/CUDA"]
  cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
  cargo:rustc-env=CUDA_COMPUTE_CAP=75
  cargo:rustc-env=CUDA_INCLUDE_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include
  cargo:rerun-if-changed=src\binary_op_macros.cuh
  cargo:rerun-if-changed=src\compatibility.cuh
  cargo:rerun-if-changed=src\cuda_utils.cuh
  cargo:rerun-if-changed=src\moe\gguf.cuh
  cargo:rerun-if-changed=src\moe\moe_utils.cuh
  cargo:rerun-if-env-changed=NVCC_CCBIN
  cargo:rerun-if-changed=src\affine.cu
  cargo:rerun-if-changed=src\moe\moe_wmma.cu
  cargo:rerun-if-changed=src\conv.cu
  cargo:rerun-if-changed=src\reduce.cu
  affine.cu
  cargo:rerun-if-changed=src\binary.cu
  conv.cu
  cargo:rerun-if-changed=src\fill.cu
  reduce.cu
  cargo:rerun-if-changed=src\sort.cu
  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(110): warning #177-D: variable "laneId" was declared but never referenced
        const int laneId = threadId % 32;
                  ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=half, WMMA_M=16, WMMA_N=16, WARPS_N=2]" at line 272

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(110): warning #177-D: variable "laneId" was declared but never referenced
        const int laneId = threadId % 32;
                  ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=half, WMMA_M=8, WMMA_N=32, WARPS_N=1]" at line 275

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(172): error: incomplete type is not allowed
                fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                         ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=16, WMMA_N=16, WARPS_N=2]" at line 281

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(173): error: incomplete type is not allowed
                fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                         ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=16, WMMA_N=16, WARPS_N=2]" at line 281

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(110): warning #177-D: variable "laneId" was declared but never referenced
        const int laneId = threadId % 32;
                  ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=16, WMMA_N=16, WARPS_N=2]" at line 281

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(172): error: incomplete type is not allowed
                fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                         ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=8, WMMA_N=32, WARPS_N=1]" at line 283

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(173): error: incomplete type is not allowed
                fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                         ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=8, WMMA_N=32, WARPS_N=1]" at line 283

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(110): warning #177-D: variable "laneId" was declared but never referenced
        const int laneId = threadId % 32;
                  ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=8, WMMA_N=32, WARPS_N=1]" at line 283

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(37): warning #177-D: variable "vllm_rs::NUM_VECS" was declared but never referenced
    constexpr int NUM_VECS = 32;
                  ^

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(40): warning #177-D: variable "vllm_rs::WARPS_PER_BLOCK" was declared but never referenced
    constexpr int WARPS_PER_BLOCK = 4;
                  ^

  4 errors detected in the compilation of "src/moe/moe_wmma.cu".
  moe_wmma.cu
  cargo:rerun-if-changed=src\moe\moe_wmma_gguf.cu
  fill.cu
  cargo:rerun-if-changed=src\indexing.cu
  sort.cu
  cargo:rerun-if-changed=src\ternary.cu
  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=32, block_q_t=block_q8_0, wrap_size=32]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=32, block_q_t=block_q8_0, wrap_size=32]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q4_K, wrap_size=32]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q4_K, wrap_size=32]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q2_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q2_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q3_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q3_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q5_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q5_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q6_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q6_K, wrap_size=64]" at line 418

  12 errors detected in the compilation of "src/moe/moe_wmma_gguf.cu".
  moe_wmma_gguf.cu
  cargo:rerun-if-changed=src\quantized.cu
  binary.cu
  cargo:rerun-if-changed=src\cast.cu
  indexing.cu
  cargo:rerun-if-changed=src\moe\moe_gguf.cu
  ternary.cu
  cargo:rerun-if-changed=src\unary.cu
  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=16, block_q_t=block_q2_K, vdr=1, vec_dot_q_cuda=vec_dot_q2_K_q8_1]" at line 4460

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=16, block_q_t=block_q2_K, vdr=1, vec_dot_q_cuda=vec_dot_q2_K_q8_1]" at line 4460

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=16, block_q_t=block_q3_K, vdr=1, vec_dot_q_cuda=vec_dot_q3_K_q8_1]" at line 4475

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=16, block_q_t=block_q3_K, vdr=1, vec_dot_q_cuda=vec_dot_q3_K_q8_1]" at line 4475

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q4_K, vdr=2, vec_dot_q_cuda=vec_dot_q4_K_q8_1]" at line 4490

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q4_K, vdr=2, vec_dot_q_cuda=vec_dot_q4_K_q8_1]" at line 4490

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q5_K, vdr=2, vec_dot_q_cuda=vec_dot_q5_K_q8_1]" at line 4505

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q5_K, vdr=2, vec_dot_q_cuda=vec_dot_q5_K_q8_1]" at line 4505

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q6_K, vdr=1, vec_dot_q_cuda=vec_dot_q6_K_q8_1]" at line 4520

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q6_K, vdr=1, vec_dot_q_cuda=vec_dot_q6_K_q8_1]" at line 4520

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=32, qi=8, block_q_t=block_q8_0, vdr=2, vec_dot_q_cuda=vec_dot_q8_0_q8_1]" at line 4535

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=32, qi=8, block_q_t=block_q8_0, vdr=2, vec_dot_q_cuda=vec_dot_q8_0_q8_1]" at line 4535

  cast.cu
  moe_gguf.cu
  unary.cu
  quantized.cu

DrJesseGlass · 2026-01-31T12:29:48Z

@OhashiReon I resolved the PTX flag issue and confirmed the build with compute_cap=75 (GTX 1650)

OhashiReon · 2026-01-31T15:22:06Z

Thank you for the update!
I got this error with the latest code:

error[E0384]: cannot assign twice to immutable variable `builder`
  --> C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\a4879af\candle-kernels\build.rs:23:9
   |
16 |     let builder = bindgen_cuda::Builder::default()
   |         ------- first assignment to `builder`
...
23 |         builder = builder.arg("-DNO_BF16_WMMA");
   |         ^^^^^^^ cannot assign twice to immutable variable
   |
help: consider making this binding mutable
   |
16 |     let mut builder = bindgen_cuda::Builder::default()
   |         +++

For more information about this error, try `rustc --explain E0384`.
error: could not compile `candle-kernels` (build script) due to 1 previous error```

I fixed it by adding mut to line 16.
After that, cargo build -p candle-core --features cuda passed successfully!

DrJesseGlass added 3 commits January 30, 2026 11:23

disable BF16 WMMA on compute cap < 80

7e30428

detect compute cap via nvidia-smi

f0a1bf7

error instead of silent no-op when BF16 WMMA unavailable

9dfdf72

DrJesseGlass mentioned this pull request Jan 30, 2026

Unable To Build for Nvidia GeForce GTX 1650 #3331

Open

pass NO_BF16_WMMA flag to PTX builder

a4879af

need mutable builder to add arg

bb4d702

DrJesseGlass mentioned this pull request Jan 31, 2026

fix candle-kernels build for CC < 700 #3300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(candle-kernels): disable BF16 WMMA kernels for pre-Ampere GPUs#3349

fix(candle-kernels): disable BF16 WMMA kernels for pre-Ampere GPUs#3349
DrJesseGlass wants to merge 5 commits intohuggingface:mainfrom
DrJesseGlass:fix/disable-bf16-wmma-pre-ampere

DrJesseGlass commented Jan 30, 2026

Uh oh!

OhashiReon commented Jan 31, 2026

Uh oh!

DrJesseGlass commented Jan 31, 2026

Uh oh!

OhashiReon commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DrJesseGlass commented Jan 30, 2026

Uh oh!

OhashiReon commented Jan 31, 2026

Uh oh!

DrJesseGlass commented Jan 31, 2026

Uh oh!

OhashiReon commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants