100 questions
0
votes
1
answer
202
views
GCC offers a _Float16 type, but - what about the functions to work with it?
GCC offers a 16-bit floating point type, outside of the C language standard: _Float16 - at least for x86_64. This allowance is described here.
However - the GCC documentation does not seem to indicate ...
0
votes
1
answer
70
views
How do I check, using CMake, whether _Float16 is supported by my compiler?
I have a C project configured with CMake. Some program within this project uses _Float16 (a "half-precision" type). I know how to determine, within the code, whether _Float16 is available:
...
4
votes
2
answers
223
views
Can I printf a half-precision floating-point value?
I have a _Float16 half-precision variable named x in my C program, and would like to printf() it. Now, I can write: printf("%f", (double) x);, and this will work; but - can I printf x ...
6
votes
1
answer
570
views
How do I convert a `float` to a `_Float16`, or even initialize a `_Float16`? (And/or print with printf?)
I'm developing a library which uses _Float16s for many of the constants to save space when passing them around. However, just testing, it seems that telling GCC to just "set it to 1" isn't ...
1
vote
0
answers
62
views
Flipping a single bit of Floating-points (IEEE-754) mathematically
I'm working on implementing a mathematical approach to bit flipping in IEEE 754 FP16 floating-point numbers without using direct bit manipulation. The goal is to flip a specific bit (particularly in ...
2
votes
0
answers
149
views
Does cuBLAS support mixed precision matrix multiplication in the form C[f32] = A[bf16] * B[f32]?
I'm concerning mixed precision in deep learning LLM. The intermediates are mostly F32 and weights could be any other type like BF16, F16, even quantized type Q8_0, Q4_0. it would be much useful if ...
1
vote
1
answer
582
views
Do all processors supporting AVX2 support F16C?
Is it safe to assume that all machines on which AVX2 is supported also support F16C instructions? I haven't encountered any machine that doesn't do that, currently. Thanks
2
votes
1
answer
108
views
float16_t rounding on ARM NEON
I am implementing emulation of ARM float16_t for X64 using SSE; the idea is to have bit-exact values on both platforms. I mostly finished the implementation, except for one thing, I cannot correctly ...
0
votes
1
answer
67
views
What makes `print(np.half(500.2))` differs from `print(f"{np.half(500.2)}")`
everyone. I've been learning floating-point truncation errors recently. But I found print(np.half(500.2)) and print(f"{np.half(500.2)}") yield different results. Here are the logs I got in ...
-2
votes
1
answer
667
views
Why do BF16 models have slower inference on Mac M-series chips compared to F16 models?
I read on https://github.com/huggingface/smollm/tree/main/smol_tools (mirror 1):
All models are quantized to 16-bit floating-point (F16) for efficient inference. Training was done on BF16, but in our ...
3
votes
2
answers
791
views
How can I convert an integer to CUDA's __half FP16 type, in a constexpr fashion?
I'm the developer of aerobus and I'm facing difficulties with half precision arithmetic.
At some point in the library, I need to convert a IntType to related FloatType (same bit count) in a constexpr ...
0
votes
1
answer
137
views
What is the difference, if any, between model.half() and model.to(dtype=torch.float16) in huggingface-transformers?
Example:
# pip install transformers
from transformers import AutoModelForTokenClassification, AutoTokenizer
# Load model
model_path = 'huawei-noah/TinyBERT_General_4L_312D'
model = ...
-1
votes
1
answer
3k
views
I load a float32 Hugging Face model, cast it to float16, and save it. How can I load it as float16?
I load a huggingface-transformers float32 model, cast it to float16, and save it. How can I load it as float16?
Example:
# pip install transformers
from transformers import ...
0
votes
1
answer
777
views
Is there any point in setting `fp16_full_eval=True` if training in `fp16`?
I train a Huggingface model with fp16=True, e.g.:
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
learning_rate=4e-5,
...
6
votes
1
answer
1k
views
AVX-512 BF16: load bf16 values directly instead of converting from fp32
On CPU's with AVX-512 and BF16 support, you can use the 512 bit vector registers to store 32 16 bit floats.
I have found intrinsics to convert FP32 values to BF16 values (for example: ...
0
votes
1
answer
330
views
Xcode Apple Silicon not comping ARM64 half precision neon instructions: Invalid operand for instruction
To date I have had no issue compiling and running complex ARM Neon assembly language routines in Xcode/CLANG, and the Apple M1 supposedly supports ARMv8.4.
But - when I try to use half precision with ...
0
votes
1
answer
143
views
std::floating_point concept in CUDA for all IEE754 types
I would like to know if CUDA provides a concept similar to std::floating_point but including all IEE754 types including e.g. __half. I provide below a sample code that test that __half template ...
0
votes
0
answers
209
views
Why doesn't /proc/cpuinfo contain fp16 if FEAT_FP16 is supported?
I know that in my ARM FEAT_FP16 is supported.
I expect seeing fp16 in the list of features reported by cat /proc/cpuinfo:
$ cat /proc/cpuinfo | grep fp | sort -u
Features : fp asimd evtstrm aes ...
4
votes
2
answers
540
views
How to call _mm256_mul_ph from rust?
_mm256_mul_ps is the Intel intrinsic for "Multiply packed single-precision (32-bit) floating-point elements". _mm256_mul_ph is the intrinsic for "Multiply packed half-precision (16-bit) ...
1
vote
1
answer
583
views
M2 Mac YOLOv8 Training: RuntimeError: "upsample_nearest2d_channels_last" not implemented for 'Half'
I want to train a Yolov8 model on a custom dataset with my Mac and this is my first time working on deep learning. Unfortunately, I experienced an error,
RuntimeError: "...
0
votes
1
answer
92
views
Convert generic type to Half value allocation-free
In an application that can write numeric values to a file using BinaryWriter I have a class that is typed to the number type that should be used for the file. It looks like this:
class ValueCollection&...
3
votes
1
answer
436
views
How to use float16 neon intrinsics on Android?
How do I use arm float16 intrinsics on Android?
Consider the following program:
#include <arm_neon.h>
int main(int, char** argv) {
const float16x8_t a = vdupq_n_f16(1.0F);
const ...
2
votes
2
answers
638
views
How do I print the half-precision / bfloat16 values from in a (binary) file?
This is a variant of:
How to print float value from binary file in shell?
in that question, we wanted to print IEEE 754 single-precision (i.e. 32-bit) floating-point values from a binary file.
Now ...
1
vote
1
answer
879
views
How can I do arithmetic on CUDA's __half type in host side code?
I have a kernel I'm running on an NVIDIA GPU, which uses the FP16 type __half, provided by cuda_fp16.hpp. To check something about its behavior, I also want to manipulate such __half values on the CPU....
0
votes
1
answer
243
views
Different methods to unpack CUDA half2 datatypes
I have some CUDA code which uses the half2 datatype. It should be just two 16 bit floating point numbers packed together in a 32 bit space.
Apparently there are the methods __low2half and __high2half ...
1
vote
0
answers
270
views
Clarification on IEEE 754 rounding to nearest, ties to even
I am working on an IEEE 754 16-bit adder, and I am confused at the round to nearest, ties to even logic.
The first addition which confuses me is 169.8 (0x594E) + -0.06256 (0xAC01).
After shifting and ...
0
votes
0
answers
90
views
Precision loss reading from `r16Snorm` texture to `half` variable in Metal
Am I correct in my assumption that reading a value from .r16SNorm texture into Metal Shading Language half data type always unavoidably incur precision loss? It wasn't obvious to me from the start ...
5
votes
3
answers
872
views
on nvidia gpu, does __hmul use fp32 core?
Refer to https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/
, each SM has three type cuda cores, e.g int32 core/fp32 core/fp64 core. If the datatype is int32/fp32/fp64, I think the ...
1
vote
1
answer
1k
views
Using bfloat16 with C++23 on x86 CPUs using g++13
I'm trying to use bfloat16 as a format for an application for work on HPC-clusters. For this I've installed g++13 which supposedly supports the bfloat16 format but this hasn't been working ...
1
vote
3
answers
4k
views
How to convert a float to a half type and the other way around in C
How can I convert a float (float32) to a half (float16) and the other way around in C while accounting for edge cases like NaN, Infinity etc.
I don't need arithmetic because I just need the types in ...
0
votes
1
answer
269
views
Using half precision with CuPy
I am trying to compile a simple CUDA kernel with CuPy using the half precision format provided by the cuda_fp16 header file.
My kernel looks like this:
code = r'''
extern "C" {
#include <...
0
votes
0
answers
125
views
16-bit floating point division (half-precision)?
how can I divide a 16-bit float point number by a 16-bit float point number (half-precision)?
I did the sign with XOR gate, the exponent with 5bit subtractor, but couldn't do the mantissa.
how can I ...
0
votes
1
answer
2k
views
List of ARM instructions implementing half-precision floating-point arithmetic
Arm Architecture Reference Manual for A-profile architecture (emphasis added):
FPHP, bits [27:24]
0b0011 As for 0b0010, and adds support for half-precision floating-point arithmetic.
A simple ...
2
votes
0
answers
102
views
Reciprocal of fp16 in OpenCL
In my OpenCL kernel I use 16bit floating point values of type half from the cl_khr_fp16 extension.
Although this gives me code that works well, I noticed with AMD's radeon developer tools that the ...
3
votes
0
answers
151
views
How to verify if the tensorflow code trains completely in FP16?
I'm trying to train a TensorFlow (version 2.11.0) code in float16.
I checked that FP16 is supported on the RTX 3090 GPU. So, I followed the below link to train the whole code in reduced precision.
...
1
vote
1
answer
986
views
Can language model inference on a CPU, save memory by quantizing?
For example, according to https://cocktailpeanut.github.io/dalai/#/ the relevant figures for LLaMA-65B are:
Full: The model takes up 432.64GB
Quantized: 5.11GB * 8 = 40.88GB
The full model won't fit ...
0
votes
0
answers
26
views
Deviation caused by half() in ptyroch
I have met a question that the value of a tensor is 6.3982e-2 in float32. After I changed it to float16 using half() function, it became 6.3965e-2. Will there be a method to convert tensor without ...
1
vote
1
answer
2k
views
atomicAdd half-precision floating-point (FP16) on CUDA Compute Capability 5.2
I am trying to atomically add a float value to a __half in CUDA 5.2. This architecture does support the __half data type and its conversion functions, but it does not include any arithmetic and atomic ...
0
votes
0
answers
742
views
Is there a reason why a nan value appears when there is no nan value in the model parameter?
I want to train the model with FP32 and perform inference with FP16.
For other networks (ResNet) with FP16, it worked.
But EDSR (super resolution) with FP16 did not work.
The differences I found are ...
0
votes
1
answer
897
views
Can float16 data type save compute cycles while computing transcendental functions?
it's clearly that float16 can save bandwidth, but is float16 can save compute cycles while computing transcendental functions, like exp()?
38
votes
2
answers
6k
views
Why is operating on Float64 faster than Float16?
I wonder why operating on Float64 values is faster than operating on Float16:
julia> rnd64 = rand(Float64, 1000);
julia> rnd16 = rand(Float16, 1000);
julia> @benchmark rnd64.^2
...
0
votes
1
answer
434
views
What are vector division and multiplication as in CUDA __half2 arithmetic?
__device__ __half2 __h2div ( const __half2 a, const __half2 b )
Description:
Divides half2 input vector a by input vector b in round-to-nearest mode.
__device__ __half2 __hmul2 ( const __half2 a, ...
1
vote
0
answers
296
views
How to round up or down when converting f32 to bf16 in rust?
I am converting from f32 to bf16 in rust, and want to control the direction of the rounding error. Is there an easy way to do this?
Converting using the standard bf16::to_f32 rounds to the nearest ...
4
votes
1
answer
197
views
float.h-like definitions for IEEE 754 binary16 half floats
I'm using half floats as implemented in the SoftFloat library (read: 100% IEEE 754 compliant), and, for the sake of completeness, I wish to provide my code with definitions equivalent to those ...
1
vote
1
answer
3k
views
Convert 16 bit hex value to FP16 in Python?
I'm trying to write a basic FP16 based calculator in python to help me debug some hardware. Can't seem to find how to convert 16b hex values unto floating point values I can use in my code to do the ...
2
votes
2
answers
1k
views
Double vs Float vs _Float16 (Running Time)
I have a simple question in C language. I am implementing a half-precision software using _Float16 in C (My mac is based on ARM), but running time is not quite faster than single or double-precision ...
8
votes
1
answer
2k
views
Why does bfloat16 have so many exponent bits?
It's clear why a 16-bit floating-point format has started seeing use for machine learning; it reduces the cost of storage and computation, and neural networks turn out to be surprisingly insensitive ...
2
votes
1
answer
6k
views
How to Enable Mixed precision training
i'm trying to train a deep learning model on vs code so i would like to use the GPU for that. I have cuda 11.6 , nvidia GeForce GTX 1650, TensorFlow-gpu==2.5.0 and pip version 21.2.3 for windows 10. ...
1
vote
2
answers
2k
views
Why is it dangerous to convert integers to float16?
I have run recently into a surprising and annoying bug in which I converted an integer into a float16 and the value changed:
>>> import numpy as np
>>> np.array([2049]).astype(np....
1
vote
2
answers
2k
views
Bit shifting a half-float into a float
I have no choice but to read in 2 bytes that make up a half-float. I would like to work with this in the form of a 4 byte float. Ive done some research and the only thing I can come up with is bit ...