Skip to content

fix(candle-kernels): disable BF16 WMMA kernels for pre-Ampere GPUs#3349

Open
DrJesseGlass wants to merge 5 commits intohuggingface:mainfrom
DrJesseGlass:fix/disable-bf16-wmma-pre-ampere
Open

fix(candle-kernels): disable BF16 WMMA kernels for pre-Ampere GPUs#3349
DrJesseGlass wants to merge 5 commits intohuggingface:mainfrom
DrJesseGlass:fix/disable-bf16-wmma-pre-ampere

Conversation

@DrJesseGlass
Copy link
Contributor

BF16 Tensor Core operations via WMMA require Ampere (sm_80) or newer. Pre-Ampere GPUs cannot compile BF16 WMMA fragment types, causing build failures on GTX 16xx and RTX 20xx cards.

So this adds a detect compute capability in build.rs via nvidia-smi and defines NO_BF16_WMMA for compute cap < 80.
Then places guard BF16 instantiations in moe_wmma.cu and moe_wmma_gguf.cu and emits an error message if BF16 is requested at runtime on unsupported hardware.
This way the FP16 WMMA and non-WMMA paths remain available.

Testing:

Builds and runs on DGX Spark (sm_100, BF16 WMMA enabled)
Hopefully GTX 16xx or RTX 20xx users in #3331 can verify.

@OhashiReon
Copy link

Thank you for the fix!

I tried building with this, but it still fails in my GTX 1650 Ti.

Here is the build log

Details
error: failed to run custom build command for `candle-kernels v0.9.2 (https://github.com/DrJesseGlass/candle.git?branch=fix%2Fdisable-bf16-wmma-pre-ampere#9dfdf72b)`

Caused by:
  process didn't exit successfully: `C:\Users\ter\Desktop\candle-test\target\debug\build\candle-kernels-1d80be0eee367860\build-script-build` (exit code: 101)
  --- stdout
  cargo::rerun-if-changed=build.rs
  cargo::rerun-if-changed=src/compatibility.cuh
  cargo::rerun-if-changed=src/cuda_utils.cuh
  cargo::rerun-if-changed=src/binary_op_macros.cuh
  cargo:warning=Detected compute cap: 75
  cargo:info=["/usr", "/usr/local/cuda", "/opt/cuda", "/usr/lib/cuda", "C:/Program Files/NVIDIA GPU Computing Toolkit", "C:/CUDA"]
  cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
  cargo:rustc-env=CUDA_COMPUTE_CAP=75
  cargo:rustc-env=CUDA_INCLUDE_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include
  cargo:rerun-if-changed=src\binary_op_macros.cuh
  cargo:rerun-if-changed=src\compatibility.cuh
  cargo:rerun-if-changed=src\cuda_utils.cuh
  cargo:rerun-if-changed=src\moe\gguf.cuh
  cargo:rerun-if-changed=src\moe\moe_utils.cuh
  cargo:rerun-if-env-changed=NVCC_CCBIN
  cargo:rerun-if-changed=src\affine.cu
  cargo:rerun-if-changed=src\moe\moe_wmma.cu
  cargo:rerun-if-changed=src\conv.cu
  cargo:rerun-if-changed=src\reduce.cu
  affine.cu
  cargo:rerun-if-changed=src\binary.cu
  conv.cu
  cargo:rerun-if-changed=src\fill.cu
  reduce.cu
  cargo:rerun-if-changed=src\sort.cu
  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(110): warning #177-D: variable "laneId" was declared but never referenced
        const int laneId = threadId % 32;
                  ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=half, WMMA_M=16, WMMA_N=16, WARPS_N=2]" at line 272

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(110): warning #177-D: variable "laneId" was declared but never referenced
        const int laneId = threadId % 32;
                  ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=half, WMMA_M=8, WMMA_N=32, WARPS_N=1]" at line 275

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(172): error: incomplete type is not allowed
                fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                         ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=16, WMMA_N=16, WARPS_N=2]" at line 281

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(173): error: incomplete type is not allowed
                fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                         ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=16, WMMA_N=16, WARPS_N=2]" at line 281

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(110): warning #177-D: variable "laneId" was declared but never referenced
        const int laneId = threadId % 32;
                  ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=16, WMMA_N=16, WARPS_N=2]" at line 281

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(172): error: incomplete type is not allowed
                fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                         ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=8, WMMA_N=32, WARPS_N=1]" at line 283

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(173): error: incomplete type is not allowed
                fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                         ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=8, WMMA_N=32, WARPS_N=1]" at line 283

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(110): warning #177-D: variable "laneId" was declared but never referenced
        const int laneId = threadId % 32;
                  ^
            detected during instantiation of "void vllm_rs::moe_gemm_grouped_kernel<T,WMMA_M,WMMA_N,WARPS_N>(const T *, const T *, const int32_t *, const int32_t *, const float *, T *, int, int, int32_t, int32_t, int32_t) [with T=nv_bfloat16, WMMA_M=8, WMMA_N=32, WARPS_N=1]" at line 283

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(37): warning #177-D: variable "vllm_rs::NUM_VECS" was declared but never referenced
    constexpr int NUM_VECS = 32;
                  ^

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma.cu(40): warning #177-D: variable "vllm_rs::WARPS_PER_BLOCK" was declared but never referenced
    constexpr int WARPS_PER_BLOCK = 4;
                  ^

  4 errors detected in the compilation of "src/moe/moe_wmma.cu".
  moe_wmma.cu
  cargo:rerun-if-changed=src\moe\moe_wmma_gguf.cu
  fill.cu
  cargo:rerun-if-changed=src\indexing.cu
  sort.cu
  cargo:rerun-if-changed=src\ternary.cu
  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=32, block_q_t=block_q8_0, wrap_size=32]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=32, block_q_t=block_q8_0, wrap_size=32]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q4_K, wrap_size=32]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q4_K, wrap_size=32]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q2_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q2_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q3_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q3_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q5_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q5_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(259): error: incomplete type is not allowed
                    fragment<matrix_a, WMMA_M, WMMA_N, WMMA_K, T, row_major> a_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q6_K, wrap_size=64]" at line 418

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\moe\moe_wmma_gguf.cu(260): error: incomplete type is not allowed
                    fragment<matrix_b, WMMA_M, WMMA_N, WMMA_K, T, col_major> b_frag;
                                                                             ^
            detected during instantiation of "void moe_gemm_gguf_prefill_kernel<T,qk,block_q_t,wrap_size>(const T *, const uint8_t *, const int32_t *, const int32_t *, const float *, float *, int, int, int32_t, int32_t, int32_t, int) [with T=nv_bfloat16, qk=256, block_q_t=block_q6_K, wrap_size=64]" at line 418

  12 errors detected in the compilation of "src/moe/moe_wmma_gguf.cu".
  moe_wmma_gguf.cu
  cargo:rerun-if-changed=src\quantized.cu
  binary.cu
  cargo:rerun-if-changed=src\cast.cu
  indexing.cu
  cargo:rerun-if-changed=src\moe\moe_gguf.cu
  ternary.cu
  cargo:rerun-if-changed=src\unary.cu
  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=16, block_q_t=block_q2_K, vdr=1, vec_dot_q_cuda=vec_dot_q2_K_q8_1]" at line 4460

  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=16, block_q_t=block_q2_K, vdr=1, vec_dot_q_cuda=vec_dot_q2_K_q8_1]" at line 4460

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=16, block_q_t=block_q3_K, vdr=1, vec_dot_q_cuda=vec_dot_q3_K_q8_1]" at line 4475

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=16, block_q_t=block_q3_K, vdr=1, vec_dot_q_cuda=vec_dot_q3_K_q8_1]" at line 4475

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q4_K, vdr=2, vec_dot_q_cuda=vec_dot_q4_K_q8_1]" at line 4490

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q4_K, vdr=2, vec_dot_q_cuda=vec_dot_q4_K_q8_1]" at line 4490

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q5_K, vdr=2, vec_dot_q_cuda=vec_dot_q5_K_q8_1]" at line 4505

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q5_K, vdr=2, vec_dot_q_cuda=vec_dot_q5_K_q8_1]" at line 4505

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q6_K, vdr=1, vec_dot_q_cuda=vec_dot_q6_K_q8_1]" at line 4520

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=256, qi=32, block_q_t=block_q6_K, vdr=1, vec_dot_q_cuda=vec_dot_q6_K_q8_1]" at line 4520

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4405): warning #177-D: variable "ncols_y" was declared but never referenced
        constexpr int ncols_y = 1;
                      ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=32, qi=8, block_q_t=block_q8_0, vdr=2, vec_dot_q_cuda=vec_dot_q8_0_q8_1]" at line 4535

  C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\9dfdf72\candle-kernels\src\quantized.cu(4417): warning #177-D: variable "blocks_per_col_y" was declared but never referenced
        const int blocks_per_col_y = k_padded / 32;
                  ^
            detected during instantiation of "void indexed_moe_forward<qk,qi,block_q_t,vdr,vec_dot_q_cuda>(const void *, const void *, const unsigned int *, float *, int, int, int, int, int, int) [with qk=32, qi=8, block_q_t=block_q8_0, vdr=2, vec_dot_q_cuda=vec_dot_q8_0_q8_1]" at line 4535

  cast.cu
  moe_gguf.cu
  unary.cu
  quantized.cu

@DrJesseGlass
Copy link
Contributor Author

@OhashiReon I resolved the PTX flag issue and confirmed the build with compute_cap=75 (GTX 1650)

@OhashiReon
Copy link

Thank you for the update!
I got this error with the latest code:

error[E0384]: cannot assign twice to immutable variable `builder`
  --> C:\Users\ter\.cargo\git\checkouts\candle-8b676b8d504f3125\a4879af\candle-kernels\build.rs:23:9
   |
16 |     let builder = bindgen_cuda::Builder::default()
   |         ------- first assignment to `builder`
...
23 |         builder = builder.arg("-DNO_BF16_WMMA");
   |         ^^^^^^^ cannot assign twice to immutable variable
   |
help: consider making this binding mutable
   |
16 |     let mut builder = bindgen_cuda::Builder::default()
   |         +++

For more information about this error, try `rustc --explain E0384`.
error: could not compile `candle-kernels` (build script) due to 1 previous error```

I fixed it by adding mut to line 16.
After that, cargo build -p candle-core --features cuda passed successfully!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants