-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Description
I will attempt to summarise here the primary points of discussion that have ensued in #7993 as it has spiraled into many threads, and I suspect a bullet-point summary of questions to be answered will be significantly easier on the committee.
Regarding Binary & Hex Parsing
As part of the refactor & introduction of binary parsing, methodology of hex parsing has also been altered a bit. Parsing currently results in the same as it currently works, with the caveat that literals with values above Int64.MaxValue are also now acceptable.
Parse as C# Literals
With that in mind, @mklement brought up the point that we may want to simply change how hex and binary parsing work. That is, mimic C#'s behaviour of these literals in source, which would mean parsing all hexadecimal literals as strictly positive (no more 0xFFFFFFFF -eq -1 — instead, 0xFFFFFFFF -eq UInt32.MaxValue) and having such literals smoothly convert up to UInt values.
With that in mind, the code patterns for hex or binary literals would seek out the lowest available parsed value type (when no type is specified) in the following order: Int32, UInt32, Int64, UInt64, Decimal, and possibly finally BigInteger.
Other Options
Hex & BigInteger
If we elect to keep current hex behaviour, we need to consider how it would behave in ranges higher than Decimal. BigInteger's default parser for hex numerals will simply assume the highest bit of a byte is indicative of the sign. As a result any numeral treated as signed that begins with 0x8 or higher will be considered the negative two's complement representation when we enter ranges that can only be parsed as BigInteger. This could be overridden easily, if this behaviour is considered to be undesirable.
Binary Parsing with Sign Bits
Then we face the issue of what to do about binary parsing. I doubt most folks working with binary directly will be working in ranges above 32-bit numbers, but I could be very wrong on that. They are, however, easier to work with in the byte, short, and int value ranges (8, 16, 32-length literals), and behaviour of a sign bit in this case is also entirely up to the parser here due to the custom implementation of binary parsing for speed concerns.
Should binary sign bits only be accepted at 32-bit lengths and up for consistency with hex parsing? Or should they be accepted at similar lengths of literal (8 binary bits, 8 hex char literal) to match up visually with hex literals? This would place a sign bit at all of the 8, 16, and 32-char lengths of a binary literal, so 0b11111111 -eq -1 and so forth, which looks similar in behaviour to hex's 0xFFFFFFFF -eq -1, despite the obvious difference in actual bit length of the literals.
Parse Numeric Literals with Underscores
E.g., 1_000_000, 0b0101_1010, 0xF_FFF_F_FFF and so forth. Should this be allowed? C# already does this with literals in source code. Are there culture-sensitive concerns around this? This is a relatively simple addition.
Experimental Feature Possibilities
If this is the best option, I am not at all against hiding alternate parse methods behind experimental flags if need be. But for that to be possible, I need a "standard" acceptable behaviour to be defined clearly so that I can lay it out for the hex and binary parse methods.
Original post is below. PR #7901 added byte literals (suffix y or uy), so that portion of this issue is completed.
See the discussion in #7509.
Emerging from the interesting dust of modifying the tokenizer are two further points:
- The tokenizer should be able to parse binary literals.
- The tokenizer should support
bytetype literals.
The trouble here is that both of these suggestions could arguably use a b suffix for numeric literals.
My opinion is that the b suffix should be used for byte literals, in keeping with the current convention that suffixes alter the numeric data type rather than simply base representation.
So what about binary? Well, jcotton in the PowerShell Slack channel / irc / discord #bridge channel mentioned that just like common hex representations use 0xAAFF057, we could also follow the common convention of binary being similar: 0b01001011
From my brief poking about, it looks like we may have to alter System.Globalization.NumberStyles in order to add Binary as a flag value -- if we follow the current implementation of hexadecimal numerals. We don't necessarily have to.
TryGetNumberValue in the tokenizer.cs file would also have to be modified to accept possibly some kind of enum for number formats as well; currently it only accepts a bool for hex specification. ScanNumberHelper would also have to be modified for this.
The suffix approach is simpler, especially with the changes already in #7509 which make adding suffixes a good deal easier. However, given that we may want to reserve the b suffix for 123b byte literals, we may need to consider adding a case for 0b0010101 syntax.
What do you guys think?
Other suggested suffixes for byte literals:
ub(sborbfor signed bytes)uy( F# style) withyfor signed bytes