Newest 'half-precision-float' Questions

0 votes

1 answer

202 views

GCC offers a _Float16 type, but - what about the functions to work with it?

GCC offers a 16-bit floating point type, outside of the C language standard: _Float16 - at least for x86_64. This allowance is described here. However - the GCC documentation does not seem to indicate ...

einpoklum

138k

asked Aug 5 at 16:33

0 votes

1 answer

70 views

How do I check, using CMake, whether _Float16 is supported by my compiler?

I have a C project configured with CMake. Some program within this project uses _Float16 (a "half-precision" type). I know how to determine, within the code, whether _Float16 is available: ...

einpoklum

138k

asked Aug 5 at 12:29

4 votes

2 answers

223 views

Can I printf a half-precision floating-point value?

I have a _Float16 half-precision variable named x in my C program, and would like to printf() it. Now, I can write: printf("%f", (double) x);, and this will work; but - can I printf x ...

einpoklum

138k

asked Jul 15 at 12:06

6 votes

1 answer

570 views

How do I convert a `float` to a `_Float16`, or even initialize a `_Float16`? (And/or print with printf?)

I'm developing a library which uses _Float16s for many of the constants to save space when passing them around. However, just testing, it seems that telling GCC to just "set it to 1" isn't ...

Coarse Rosinflower

446

asked Mar 21 at 2:10

1 vote

0 answers

62 views

Flipping a single bit of Floating-points (IEEE-754) mathematically

I'm working on implementing a mathematical approach to bit flipping in IEEE 754 FP16 floating-point numbers without using direct bit manipulation. The goal is to flip a specific bit (particularly in ...

Muhammad Zaky

11

asked Mar 18 at 10:34

2 votes

0 answers

149 views

Does cuBLAS support mixed precision matrix multiplication in the form C[f32] = A[bf16] * B[f32]?

I'm concerning mixed precision in deep learning LLM. The intermediates are mostly F32 and weights could be any other type like BF16, F16, even quantized type Q8_0, Q4_0. it would be much useful if ...

dentry

21

asked Mar 3 at 14:05

1 vote

1 answer

582 views

Do all processors supporting AVX2 support F16C?

Is it safe to assume that all machines on which AVX2 is supported also support F16C instructions? I haven't encountered any machine that doesn't do that, currently. Thanks

Srihari S

107

asked Feb 12 at 2:50

2 votes

1 answer

108 views

float16_t rounding on ARM NEON

I am implementing emulation of ARM float16_t for X64 using SSE; the idea is to have bit-exact values on both platforms. I mostly finished the implementation, except for one thing, I cannot correctly ...

Bogi

2,718

asked Jan 8 at 21:25

0 votes

1 answer

67 views

What makes `print(np.half(500.2))` differs from `print(f"{np.half(500.2)}")`

everyone. I've been learning floating-point truncation errors recently. But I found print(np.half(500.2)) and print(f"{np.half(500.2)}") yield different results. Here are the logs I got in ...

Cestimium

3

asked Nov 22, 2024 at 2:35

-2 votes

1 answer

667 views

Why do BF16 models have slower inference on Mac M-series chips compared to F16 models?

I read on https://github.com/huggingface/smollm/tree/main/smol_tools (mirror 1): All models are quantized to 16-bit floating-point (F16) for efficient inference. Training was done on BF16, but in our ...

Franck Dernoncourt

84.8k

asked Nov 7, 2024 at 17:32

3 votes

2 answers

791 views

How can I convert an integer to CUDA's __half FP16 type, in a constexpr fashion?

I'm the developer of aerobus and I'm facing difficulties with half precision arithmetic. At some point in the library, I need to convert a IntType to related FloatType (same bit count) in a constexpr ...

Regis Portalez

4,900

asked Sep 14, 2024 at 10:04

0 votes

1 answer

137 views

What is the difference, if any, between model.half() and model.to(dtype=torch.float16) in huggingface-transformers?

Example: # pip install transformers from transformers import AutoModelForTokenClassification, AutoTokenizer # Load model model_path = 'huawei-noah/TinyBERT_General_4L_312D' model = ...

Franck Dernoncourt

84.8k

asked Jul 7, 2024 at 23:33

-1 votes

1 answer

3k views

I load a float32 Hugging Face model, cast it to float16, and save it. How can I load it as float16?

I load a huggingface-transformers float32 model, cast it to float16, and save it. How can I load it as float16? Example: # pip install transformers from transformers import ...

Franck Dernoncourt

84.8k

asked Jul 5, 2024 at 23:58

0 votes

1 answer

777 views

Is there any point in setting `fp16_full_eval=True` if training in `fp16`?

I train a Huggingface model with fp16=True, e.g.: training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", learning_rate=4e-5, ...

Franck Dernoncourt

84.8k

asked Jun 28, 2024 at 22:04

6 votes

1 answer

1k views

AVX-512 BF16: load bf16 values directly instead of converting from fp32

On CPU's with AVX-512 and BF16 support, you can use the 512 bit vector registers to store 32 16 bit floats. I have found intrinsics to convert FP32 values to BF16 values (for example: ...

Thijs Steel

1,272

asked May 2, 2024 at 13:42

0 votes

1 answer

330 views

Xcode Apple Silicon not comping ARM64 half precision neon instructions: Invalid operand for instruction

To date I have had no issue compiling and running complex ARM Neon assembly language routines in Xcode/CLANG, and the Apple M1 supposedly supports ARMv8.4. But - when I try to use half precision with ...

user2465201

601

asked Apr 19, 2024 at 15:17

0 votes

1 answer

143 views

std::floating_point concept in CUDA for all IEE754 types

I would like to know if CUDA provides a concept similar to std::floating_point but including all IEE754 types including e.g. __half. I provide below a sample code that test that __half template ...

Dimitri Lesnoff

373

asked Mar 7, 2024 at 15:06

0 votes

0 answers

209 views

Why doesn't /proc/cpuinfo contain fp16 if FEAT_FP16 is supported?

I know that in my ARM FEAT_FP16 is supported. I expect seeing fp16 in the list of features reported by cat /proc/cpuinfo: $ cat /proc/cpuinfo | grep fp | sort -u Features : fp asimd evtstrm aes ...

pmor

6,775

asked Mar 5, 2024 at 11:40

4 votes

2 answers

540 views

How to call _mm256_mul_ph from rust?

_mm256_mul_ps is the Intel intrinsic for "Multiply packed single-precision (32-bit) floating-point elements". _mm256_mul_ph is the intrinsic for "Multiply packed half-precision (16-bit) ...

dmeister

35.9k

asked Feb 22, 2024 at 1:14

1 vote

1 answer

583 views

M2 Mac YOLOv8 Training: RuntimeError: "upsample_nearest2d_channels_last" not implemented for 'Half'

I want to train a Yolov8 model on a custom dataset with my Mac and this is my first time working on deep learning. Unfortunately, I experienced an error, RuntimeError: "...

Figtor

11

asked Feb 18, 2024 at 14:02

0 votes

1 answer

92 views

Convert generic type to Half value allocation-free

In an application that can write numeric values to a file using BinaryWriter I have a class that is typed to the number type that should be used for the file. It looks like this: class ValueCollection&...

ygoe

20.8k

asked Feb 12, 2024 at 21:23

3 votes

1 answer

436 views

How to use float16 neon intrinsics on Android?

How do I use arm float16 intrinsics on Android? Consider the following program: #include <arm_neon.h> int main(int, char** argv) { const float16x8_t a = vdupq_n_f16(1.0F); const ...

fabian

1,881

asked Feb 8, 2024 at 9:21

2 votes

2 answers

638 views

How do I print the half-precision / bfloat16 values from in a (binary) file?

This is a variant of: How to print float value from binary file in shell? in that question, we wanted to print IEEE 754 single-precision (i.e. 32-bit) floating-point values from a binary file. Now ...

einpoklum

138k

asked Feb 1, 2024 at 12:31

1 vote

1 answer

879 views

How can I do arithmetic on CUDA's __half type in host side code?

I have a kernel I'm running on an NVIDIA GPU, which uses the FP16 type __half, provided by cuda_fp16.hpp. To check something about its behavior, I also want to manipulate such __half values on the CPU....

einpoklum

138k

asked Dec 28, 2023 at 13:31

0 votes

1 answer

243 views

Different methods to unpack CUDA half2 datatypes

I have some CUDA code which uses the half2 datatype. It should be just two 16 bit floating point numbers packed together in a 32 bit space. Apparently there are the methods __low2half and __high2half ...

Martin Ueding

8,817

asked Dec 12, 2023 at 17:00

1 vote

0 answers

270 views

Clarification on IEEE 754 rounding to nearest, ties to even

I am working on an IEEE 754 16-bit adder, and I am confused at the round to nearest, ties to even logic. The first addition which confuses me is 169.8 (0x594E) + -0.06256 (0xAC01). After shifting and ...

Benjamin Owen

637

asked Dec 11, 2023 at 7:25

0 votes

0 answers

90 views

Precision loss reading from `r16Snorm` texture to `half` variable in Metal

Am I correct in my assumption that reading a value from .r16SNorm texture into Metal Shading Language half data type always unavoidably incur precision loss? It wasn't obvious to me from the start ...

simd

2,059

asked Sep 29, 2023 at 20:47

5 votes

3 answers

872 views

on nvidia gpu, does __hmul use fp32 core?

Refer to https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ , each SM has three type cuda cores, e.g int32 core/fp32 core/fp64 core. If the datatype is int32/fp32/fp64, I think the ...

irasin

175

asked Aug 23, 2023 at 8:21

1 vote

1 answer

1k views

Using bfloat16 with C++23 on x86 CPUs using g++13

I'm trying to use bfloat16 as a format for an application for work on HPC-clusters. For this I've installed g++13 which supposedly supports the bfloat16 format but this hasn't been working ...

Vistemboir

11

asked Aug 5, 2023 at 14:53

1 vote

3 answers

4k views

How to convert a float to a half type and the other way around in C

How can I convert a float (float32) to a half (float16) and the other way around in C while accounting for edge cases like NaN, Infinity etc. I don't need arithmetic because I just need the types in ...

juffma

189

asked Jul 30, 2023 at 18:20

0 votes

1 answer

269 views

Using half precision with CuPy

I am trying to compile a simple CUDA kernel with CuPy using the half precision format provided by the cuda_fp16 header file. My kernel looks like this: code = r''' extern "C" { #include <...

Markus Holzer

9

asked Jun 22, 2023 at 12:21

0 votes

0 answers

125 views

16-bit floating point division (half-precision)?

how can I divide a 16-bit float point number by a 16-bit float point number (half-precision)? I did the sign with XOR gate, the exponent with 5bit subtractor, but couldn't do the mantissa. how can I ...

Arthur

1

asked May 23, 2023 at 19:16

0 votes

1 answer

2k views

List of ARM instructions implementing half-precision floating-point arithmetic

Arm Architecture Reference Manual for A-profile architecture (emphasis added): FPHP, bits [27:24] 0b0011 As for 0b0010, and adds support for half-precision floating-point arithmetic. A simple ...

pmor

6,775

asked May 15, 2023 at 15:27

2 votes

0 answers

102 views

Reciprocal of fp16 in OpenCL

In my OpenCL kernel I use 16bit floating point values of type half from the cl_khr_fp16 extension. Although this gives me code that works well, I noticed with AMD's radeon developer tools that the ...

Bram

8,463

asked May 3, 2023 at 23:30

3 votes

0 answers

151 views

How to verify if the tensorflow code trains completely in FP16?

I'm trying to train a TensorFlow (version 2.11.0) code in float16. I checked that FP16 is supported on the RTX 3090 GPU. So, I followed the below link to train the whole code in reduced precision. ...

Sherlock

63

asked Mar 22, 2023 at 15:13

1 vote

1 answer

986 views

Can language model inference on a CPU, save memory by quantizing?

For example, according to https://cocktailpeanut.github.io/dalai/#/ the relevant figures for LLaMA-65B are: Full: The model takes up 432.64GB Quantized: 5.11GB * 8 = 40.88GB The full model won't fit ...

rwallace

34.2k

asked Mar 16, 2023 at 6:41

0 votes

0 answers

26 views

Deviation caused by half() in ptyroch

I have met a question that the value of a tensor is 6.3982e-2 in float32. After I changed it to float16 using half() function, it became 6.3965e-2. Will there be a method to convert tensor without ...

zhangbw

13

asked Feb 23, 2023 at 10:52

1 vote

1 answer

2k views

atomicAdd half-precision floating-point (FP16) on CUDA Compute Capability 5.2

I am trying to atomically add a float value to a __half in CUDA 5.2. This architecture does support the __half data type and its conversion functions, but it does not include any arithmetic and atomic ...

Skip

40

asked Jan 3, 2023 at 15:09

0 votes

0 answers

742 views

Is there a reason why a nan value appears when there is no nan value in the model parameter?

I want to train the model with FP32 and perform inference with FP16. For other networks (ResNet) with FP16, it worked. But EDSR (super resolution) with FP16 did not work. The differences I found are ...

SIwoo Lee

3

asked Dec 29, 2022 at 5:40

0 votes

1 answer

897 views

Can float16 data type save compute cycles while computing transcendental functions?

it's clearly that float16 can save bandwidth, but is float16 can save compute cycles while computing transcendental functions, like exp()?

Leonardo Physh

1,433

asked Dec 15, 2022 at 3:58

38 votes

2 answers

6k views

Why is operating on Float64 faster than Float16?

I wonder why operating on Float64 values is faster than operating on Float16: julia> rnd64 = rand(Float64, 1000); julia> rnd16 = rand(Float16, 1000); julia> @benchmark rnd64.^2 ...

Shayan

6,722

asked Dec 6, 2022 at 14:06

0 votes

1 answer

434 views

What are vector division and multiplication as in CUDA __half2 arithmetic?

__device__ __half2 __h2div ( const __half2 a, const __half2 b ) Description: Divides half2 input vector a by input vector b in round-to-nearest mode. __device__ __half2 __hmul2 ( const __half2 a, ...

Aryan

442

asked Nov 30, 2022 at 4:21

1 vote

0 answers

296 views

How to round up or down when converting f32 to bf16 in rust?

I am converting from f32 to bf16 in rust, and want to control the direction of the rounding error. Is there an easy way to do this? Converting using the standard bf16::to_f32 rounds to the nearest ...

Amir

898

asked Nov 3, 2022 at 9:38

4 votes

1 answer

197 views

float.h-like definitions for IEEE 754 binary16 half floats

I'm using half floats as implemented in the SoftFloat library (read: 100% IEEE 754 compliant), and, for the sake of completeness, I wish to provide my code with definitions equivalent to those ...

cesss

913

asked Aug 29, 2022 at 13:01

1 vote

1 answer

3k views

Convert 16 bit hex value to FP16 in Python?

I'm trying to write a basic FP16 based calculator in python to help me debug some hardware. Can't seem to find how to convert 16b hex values unto floating point values I can use in my code to do the ...

ajcrm125

353

asked Aug 25, 2022 at 0:14

2 votes

2 answers

1k views

Double vs Float vs _Float16 (Running Time)

I have a simple question in C language. I am implementing a half-precision software using _Float16 in C (My mac is based on ARM), but running time is not quite faster than single or double-precision ...

YUNBLACK

21

asked Jul 9, 2022 at 16:38

8 votes

1 answer

2k views

Why does bfloat16 have so many exponent bits?

It's clear why a 16-bit floating-point format has started seeing use for machine learning; it reduces the cost of storage and computation, and neural networks turn out to be surprisingly insensitive ...

rwallace

34.2k

asked Jun 2, 2022 at 10:33

2 votes

1 answer

6k views

How to Enable Mixed precision training

i'm trying to train a deep learning model on vs code so i would like to use the GPU for that. I have cuda 11.6 , nvidia GeForce GTX 1650, TensorFlow-gpu==2.5.0 and pip version 21.2.3 for windows 10. ...

samar

33

asked May 18, 2022 at 11:05

1 vote

2 answers

2k views

Why is it dangerous to convert integers to float16?

I have run recently into a surprising and annoying bug in which I converted an integer into a float16 and the value changed: >>> import numpy as np >>> np.array([2049]).astype(np....

guhur

2,916

asked Mar 11, 2022 at 20:45

1 vote

2 answers

2k views

Bit shifting a half-float into a float

I have no choice but to read in 2 bytes that make up a half-float. I would like to work with this in the form of a 4 byte float. Ive done some research and the only thing I can come up with is bit ...

Justin Barren

23

asked Feb 15, 2022 at 1:29

Collectives™ on Stack Overflow