Allow binary numbers, byte literals, and BigInteger?

I will attempt to summarise here the primary points of discussion that have ensued in #7993 as it has spiraled into many threads, and I suspect a bullet-point summary of questions to be answered will be significantly easier on the committee. 

## Regarding Binary & Hex Parsing

As part of the refactor & introduction of binary parsing, methodology of hex parsing has also been altered a bit. Parsing currently _results_ in the same as it currently works, with the caveat that literals with values above `Int64.MaxValue` are also now acceptable.

### Parse as C# Literals

With that in mind, @mklement brought up the point that we may want to simply _change_ how hex and binary parsing work. That is, mimic C#'s behaviour of these literals in source, which would mean parsing _all_ hexadecimal literals as strictly positive (no more `0xFFFFFFFF -eq -1` &mdash; instead, `0xFFFFFFFF -eq UInt32.MaxValue`) **and** having such literals smoothly convert up to `UInt` values. 

With that in mind, the code patterns for hex or binary literals would seek out the lowest available parsed value type (when no type is specified) in the following order: `Int32`, `UInt32`, `Int64`, `UInt64`, `Decimal`, and possibly finally `BigInteger`.

### Other Options

#### Hex & BigInteger

If we elect to _keep_ current hex behaviour, we need to consider how it would behave in ranges higher than `Decimal`. `BigInteger`'s default parser for hex numerals will simply assume the highest bit of a byte is indicative of the sign. As a result any numeral treated as signed that begins with `0x8` or higher will be considered the negative two's complement representation when we enter ranges that can only be parsed as `BigInteger`. This could be overridden easily, if this behaviour is considered to be undesirable.

#### Binary Parsing with Sign Bits

Then we face the issue of what to do about binary parsing. I doubt most folks working with binary directly will be working in ranges above 32-bit numbers, but I could be very wrong on that. They are, however, easier to work with in the `byte`, `short`, and `int` value ranges (8, 16, 32-length literals), and behaviour of a sign bit in this case is _also_ entirely up to the parser here due to the custom implementation of binary parsing for speed concerns.

Should binary sign bits only be accepted at 32-bit lengths and up for consistency with hex parsing? Or should they be accepted at similar _lengths_ of literal (8 binary bits, 8 hex char literal) to match up visually with hex literals? This would place a sign bit at _all_ of the 8, 16, and 32-char lengths of a binary literal, so `0b11111111 -eq -1` and so forth, which looks similar in behaviour to hex's `0xFFFFFFFF -eq -1`, despite the obvious difference in actual bit length of the literals.

## Parse Numeric Literals with Underscores

E.g., `1_000_000`, `0b0101_1010`, `0xF_FFF_F_FFF` and so forth. Should this be allowed? C# already does this with literals in source code. Are there culture-sensitive concerns around this? This is a relatively simple addition.

## Experimental Feature Possibilities

If this is the best option, I am not at all against hiding alternate parse methods behind experimental flags if need be. But for that to be possible, I need a "standard" acceptable behaviour to be defined clearly so that I can lay it out for the hex and binary parse methods.

---

Original post is below. PR #7901 added byte literals (suffix `y` or `uy`), so that portion of this issue is completed.

---
See the discussion in #7509.

Emerging from the interesting dust of modifying the tokenizer are two further points:

1. The tokenizer should be able to parse binary literals.
2. The tokenizer should support `byte` type literals.

The *trouble* here is that both of these suggestions could arguably use a `b` suffix for numeric literals.

My opinion is that the `b` *suffix* should be used for `byte` literals, in keeping with the current convention that suffixes alter the numeric *data* type rather than simply base representation.

So what about binary? Well, **jcotton** in the PowerShell Slack channel / irc / discord #bridge channel mentioned that just like common hex representations use `0xAAFF057`, we could also follow the common convention of binary being similar: `0b01001011`

From my brief poking about, it *looks* like we may have to alter `System.Globalization.NumberStyles` in order to add `Binary` as a flag value -- *if* we follow the current implementation of hexadecimal numerals. We don't necessarily have to. 

`TryGetNumberValue` in the `tokenizer.cs` file would also have to be modified to accept possibly some kind of `enum` for number formats as well; currently it only accepts a `bool` for `hex` specification. `ScanNumberHelper` would also have to be modified for this.

The suffix approach is simpler, especially with the changes already in #7509 which make adding suffixes a good deal easier. However, given that we may want to reserve the `b` suffix for `123b` *byte* literals, we may need to consider adding a case for `0b0010101` syntax.

What do you guys think?

Other suggested suffixes for byte literals:

* `ub` ( `sb` or `b` for signed bytes)
* `uy` ( F# style) with `y` for signed bytes


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow binary numbers, byte literals, and BigInteger? #7557

Regarding Binary & Hex Parsing

Parse as C# Literals

Other Options

Hex & BigInteger

Binary Parsing with Sign Bits

Parse Numeric Literals with Underscores

Experimental Feature Possibilities

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow binary numbers, byte literals, and BigInteger? #7557

Description

Regarding Binary & Hex Parsing

Parse as C# Literals

Other Options

Hex & BigInteger

Binary Parsing with Sign Bits

Parse Numeric Literals with Underscores

Experimental Feature Possibilities

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions