16-bit half-precision floating-point number.
var Float16 = require( '@stdlib/number/float16/ctor' );16-bit half-precision floating-point number constructor.
var x = new Float16( 5.0 );
// returns <Float16>Static property returning the constructor name.
var str = Float16.name;
// returns 'Float16'Size (in bytes) of the underlying value.
var nbytes = Float16.BYTES_PER_ELEMENT;
// returns 2Size (in bytes) of the underlying value.
var x = new Float16( 5.0 );
var nbytes = x.BYTES_PER_ELEMENT;
// returns 2A Float16 instance has the following properties...
A read-only property returning the underlying value as a number.
var x = new Float16( 5.0 );
var v = x.value;
// returns 5.0These methods do not mutate a Float16 instance and, instead return a half-precision floating-point number representation.
Returns a string representation of a Float16 instance.
var x = new Float16( 5.0 );
var str = x.toString();
// returns '5'
x = new Float16( -3.14 );
str = x.toString();
// returns '-3.140625'Returns a JSON representation of a Float16 instance. JSON.stringify() implicitly calls this method when stringifying a Float16 instance.
var x = new Float16( 5.0 );
var o = x.toJSON();
/*
{
"type": "Float16",
"value": 5.0
}
*/To revive a Float16 number from a JSON string, see @stdlib/number/float16/reviver.
Converts a Float16 instance to a primitive value.
var x = new Float16( 5.0 );
var v = x.valueOf();
// returns 5.0
x = new Float16( 3.14 );
v = x.valueOf();
// returns 3.140625- The underlying value is stored as a half-precision floating-point number IEEE 754 with 1 sign bit, 5 exponent bits, and 10 significand bits.
- A half-precision floating-point number has a range of approximately
±6.55e4and a precision of about 3-4 decimal digits.
var Float16 = require( '@stdlib/number/float16/ctor' );
var x = new Float16( 3.14 );
console.log( 'type: %s', typeof x );
// => 'type: object'
console.log( 'str: %s', x );
// => 'str: 3.140625'
console.log( 'value: %d', x.value );
// => 'value: 3.140625'
console.log( 'JSON: %s', JSON.stringify( x ) );
// => 'JSON: {"type":"Float16","value":3.140625}'#include "stdlib/number/float16/ctor.h"An opaque type definition for a half-precision floating-point number.
stdlib_float16_t v = stdlib_float16_from_bits( 51648 );An opaque type definition for a union for accessing the underlying binary representation of a half-precision floating-point number.
#include <stdint.h>
stdlib_float16_t x = stdlib_float16_from_bits( 51648 );
stdlib_float16_bits_t y;
y.value = x;
uint16_t bits = y.bits;
// returns 51648The union has the following members:
- value:
stdlib_float16_thalf-precision floating-point number. - bits:
uint16_tbinary representation.
The union allows "type punning"; however, while (more or less) defined in C99, behavior is implementation-defined in C++. For more robust conversion, prefer using explicit helpers for converting to and from binary representation.
Converts a 16-bit binary representation to a half-precision floating-point number.
stdlib_float16_t v = stdlib_float16_from_bits( 51648 ); // => -11.5The function accepts the following arguments:
- bits:
[in] uint16_t16-bit integer corresponding to a binary representation.
Converts a half-precision floating-point number to a 16-bit binary representation.
#include <stdint.h>
stdlib_float16_t v = stdlib_float16_from_bits( 51648 ); // => -11.5
uint16_t bits = stdlib_float16_to_bits( v );The function accepts the following arguments:
- x:
[in] stdlib_float16_thalf-precision floating-point number.
- The
stdlib_float16_ttype should be treated as a storage and interchange type. Native hardware support for mathematical functions operating on half-precision floating-point numbers varies. As a consequence, for most operations, one should first promote to single-precision (i.e.,float), perform the desired operation, and then downcast back to half-precision.
#include "stdlib/number/float16/ctor.h"
#include <stdint.h>
#include <stdio.h>
int main( void ) {
const stdlib_float16_t x[] = {
stdlib_float16_from_bits( 51648 ), // -11.5
stdlib_float16_from_bits( 18880 ) // 11.5
};
int i;
for ( i = 0; i < 2; i++ ) {
printf( "%d\n", stdlib_float16_to_bits( x[ i ] ) );
}
}