Conversation
| // If we don't need to downlevel and we can reach the original source text using | ||
| // the node's parent reference, then simply get the text as it was originally written. | ||
| if (!nodeIsSynthesized(node) && node.parent) { | ||
| if (!nodeIsSynthesized(node) && node.parent && !(isNumericLiteral(node) && node.numericLiteralFlags & TokenFlags.ContainsSeparator)) { |
There was a problem hiding this comment.
we should emit them as is for ESNext i would say..
There was a problem hiding this comment.
Acctually ES2018 and later.
There was a problem hiding this comment.
Should we add an es2018 target now or wait until the ES2018 spec is official?
There was a problem hiding this comment.
@mhegazy said offline that we should open an issue to add an ES2018 target, since there are some lib additions to make in addition to adding an output mode that preserves these separators.
|
can you run the perf tests as well. |
rbuckton
left a comment
There was a problem hiding this comment.
Can you take a look at https://docs.google.com/presentation/d/1E8yKRJwA4iX_EctpY48KGBwAsCtNZpTOz4wu7tbxTqE/edit#slide=id.g29f65e0163_0_202 to make sure we test those examples as well? It looks like you covered most of them, but I didn't see tests for a few like ._ and 1\u005F01234 that it would be worth verifying we cover.
src/compiler/scanner.ts
Outdated
| error(Diagnostics.Numeric_seperators_are_not_allowed_at_the_end_of_a_literal, 1); | ||
| pos++; | ||
| } | ||
| return result = (result || "") + text.substring(start, pos); |
There was a problem hiding this comment.
You definitely assign result to at least "", so why not just initialize result above?
There was a problem hiding this comment.
Also, you don't need to reassign result here.
There was a problem hiding this comment.
You definitely assign
resultto at least"", so why not just initializeresultabove?
result is only assigned in the presence of a separator.
Also, you don't need to reassign
resulthere.
Done.
src/compiler/scanner.ts
Outdated
| break; | ||
| } | ||
| if (text.charCodeAt(pos - 1) === CharacterCodes._) { | ||
| pos--; |
There was a problem hiding this comment.
We do this exact set of three lines multiple times in this PR. Would it make sense to provide an optional argument to error to override the position it uses?
There was a problem hiding this comment.
Why not add a look-ahead and check whether the subsequent character could continue the literal? That way you don't need to rewind at the end.
| start = pos; | ||
| continue; | ||
| } | ||
| if (isDigit(ch)) { |
There was a problem hiding this comment.
Considering the similarities between this, scanHexDigit, and scanBinaryOrOctalDigits perhaps we should unify (at least in part) these three functions by having scanNumberFragment take a callback for isDigit and then calling it as scanNumberFragment(isDigit), scanNumberFragment(isHexDigit), scanNumberFragment(isBinaryDigit), and scanNumberFragment(isOctalDigit).
There was a problem hiding this comment.
scanBinaryOrOctalDigits and scanHexDigit also do arithmetic to calculate the actual numeric value, however (rather than relying on the host being able to parse the literal correctly). I would also need to pass thru another callback to retain that behavior, too.
There was a problem hiding this comment.
scanHexDigit also has to deal with validating that there are a certain number of digits and have an option to not recognize separators, for when its used while handling escape sequences. IMO, while the general structure is similar, they're distinct enough to not warrant a shared base.
There was a problem hiding this comment.
Alternatively, you can pass in a base of type 2 | 8 | 10 | 16 like in scanBinaryOrOctalDigits and special case hex characters for base 16?
There was a problem hiding this comment.
Worth keeping focus on performance when parsing numbers, even at a cost of flexibility. Too hot a code path.
| // If we don't need to downlevel and we can reach the original source text using | ||
| // the node's parent reference, then simply get the text as it was originally written. | ||
| if (!nodeIsSynthesized(node) && node.parent) { | ||
| if (!nodeIsSynthesized(node) && node.parent && !(isNumericLiteral(node) && node.numericLiteralFlags & TokenFlags.ContainsSeparator)) { |
There was a problem hiding this comment.
Should we add an es2018 target now or wait until the ES2018 spec is official?
| ==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/3.ts (3 errors) ==== | ||
| 0_B0101 | ||
| ~ | ||
| !!! error TS6188: Numeric seperators are not allowed at the end of a literal. |
There was a problem hiding this comment.
The error message isn't clear considering the context. Perhaps it should be Numeric separators are not allowed here.
src/compiler/diagnosticMessages.json
Outdated
| "category": "Message", | ||
| "code": 6187 | ||
| }, | ||
| "Numeric seperators are not allowed at the end of a literal.": { |
|
|
||
| !!! error TS1177: Binary digit expected. | ||
| ~~~~ | ||
| !!! error TS2304: Cannot find name '_110'. |
There was a problem hiding this comment.
Should we consider just parsing out the rest of the number rather than just stop? Then we'd avoid the excess Cannot find name errors.
There was a problem hiding this comment.
The proposed grammar says the production is effectively Digit Sep Digit, so we'd be much more lenient than is needed. Which seems fine; I don't think 10__01 can be treated as anything other than a number with too many separators.
| !!! error TS2304: Cannot find name 'B0101'. | ||
|
|
||
| ==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/4.ts (3 errors) ==== | ||
| 0b01__11 |
There was a problem hiding this comment.
As above, after we report the numeric separator error, we may want to continue to parse out the rest of the number to avoid excessive unrelated errors, though I'd like to get @mhegazy's or @DanielRosenwasser's opinion on that.
| 0x00_11; | ||
| 0X0_1; | ||
| 0x1100_0011; | ||
| 0X0_11_0101; |
There was a problem hiding this comment.
Per tc39/proposal-numeric-separator#25 it looks like "\u{10_ffff}" is permitted as an extended Unicode escape sequence. Can you add this to your existing tests?
There was a problem hiding this comment.
Discussion on tc39/proposal-numeric-separator#25 has continued and it is apparently now an explicit syntax error to witness an _ in a literal in an escape sequence.
src/compiler/scanner.ts
Outdated
| } | ||
| } | ||
| if (tokenFlags & TokenFlags.ContainsSeparator) { | ||
| if (decimalFragment && scientificFragment) { |
There was a problem hiding this comment.
What about just
let result = mainFragment;
if (decimalFragment) {
result += "." + decimalFragment;
}
if (scientificFragment) {
result += "." + scientificFragment;
}
return "" + +mainFragment;
src/compiler/scanner.ts
Outdated
| function scanHexDigits(minCount: number, scanAsManyAsPossible: boolean): number { | ||
| let digits = 0; | ||
| let value = 0; | ||
| let seperatorAllowed = false; |
src/compiler/scanner.ts
Outdated
| // For counting number of digits; Valid binaryIntegerLiteral must have at least one binary digit following B or b. | ||
| // Similarly valid octalIntegerLiteral must have at least one octal digit following o or O. | ||
| let numberOfDigits = 0; | ||
| let seperatorAllowed = false; |
|
@rbuckton I think I've hit on all your comments except factoring It's probably worth noting that I included regexp literals with the |
Perf shows little to no change (erring towards slight parse time improvement). Should we merge? |
DanielRosenwasser
left a comment
There was a problem hiding this comment.
Looks good. Ideally, we'd continue parsing Unicode escapes with separators, and give a better error on consecutive separators, but that can be done later.
src/compiler/scanner.ts
Outdated
| let seperatorAllowed = false; | ||
| while (true) { | ||
| const ch = text.charCodeAt(pos); | ||
| // Numeric seperators are allowed anywhere within a numeric literal, except not at the beginning, or following another seperator |
src/compiler/scanner.ts
Outdated
| break; | ||
| } | ||
| if (text.charCodeAt(pos - 1) === CharacterCodes._) { | ||
| pos--; |
There was a problem hiding this comment.
Why not add a look-ahead and check whether the subsequent character could continue the literal? That way you don't need to rewind at the end.
| result += text.substring(start, pos); | ||
| } | ||
| else { | ||
| error(Diagnostics.Numeric_separators_are_not_allowed_here, pos, 1); |
There was a problem hiding this comment.
"Multiple consecutive numeric separators are not permitted."
ECMAScript numeric separator proposal is now on stage3 https://github.com/tc39/proposal-numeric-separator and will come to TypeScript at version 2.7. microsoft/TypeScript#20324 Example: ```typescript const i = 123_456; const x = 0x1fa_EF6; const f = 123_456.0_456; const f2 = 123_456.4_5e1_2; ```
This implements the now stage-3 numeric separators proposal.
This enables us to parse literals with underscore separators such as
1_000_000_000,0b1101_0101_1001, and0xAE_FE_2F. At present, these are always downleveled to a non-underscore-containing version.