Numeric separators by weswigham · Pull Request #20324 · microsoft/TypeScript

weswigham · 2017-11-29T03:09:32Z

This implements the now stage-3 numeric separators proposal.

This enables us to parse literals with underscore separators such as 1_000_000_000, 0b1101_0101_1001, and 0xAE_FE_2F. At present, these are always downleveled to a non-underscore-containing version.

mhegazy · 2017-11-29T18:17:47Z

src/compiler/utilities.ts

        // If we don't need to downlevel and we can reach the original source text using
        // the node's parent reference, then simply get the text as it was originally written.
-        if (!nodeIsSynthesized(node) && node.parent) {
+        if (!nodeIsSynthesized(node) && node.parent && !(isNumericLiteral(node) && node.numericLiteralFlags & TokenFlags.ContainsSeparator)) {


we should emit them as is for ESNext i would say..

Acctually ES2018 and later.

Should we add an es2018 target now or wait until the ES2018 spec is official?

@mhegazy said offline that we should open an issue to add an ES2018 target, since there are some lib additions to make in addition to adding an output mode that preserves these separators.

Opened #20342 to track.

mhegazy · 2017-11-29T18:28:05Z

can you run the perf tests as well.

rbuckton

Can you take a look at https://docs.google.com/presentation/d/1E8yKRJwA4iX_EctpY48KGBwAsCtNZpTOz4wu7tbxTqE/edit#slide=id.g29f65e0163_0_202 to make sure we test those examples as well? It looks like you covered most of them, but I didn't see tests for a few like ._ and 1\u005F01234 that it would be worth verifying we cover.

rbuckton · 2017-11-29T20:04:46Z

src/compiler/scanner.ts

+                error(Diagnostics.Numeric_seperators_are_not_allowed_at_the_end_of_a_literal, 1);
+                pos++;
+            }
+            return result = (result || "") + text.substring(start, pos);


You definitely assign result to at least "", so why not just initialize result above?

Also, you don't need to reassign result here.

You definitely assign result to at least "", so why not just initialize result above?

result is only assigned in the presence of a separator.

Also, you don't need to reassign result here.

Done.

rbuckton · 2017-11-29T20:12:24Z

src/compiler/scanner.ts

+                break;
+            }
+            if (text.charCodeAt(pos - 1) === CharacterCodes._) {
+                pos--;


We do this exact set of three lines multiple times in this PR. Would it make sense to provide an optional argument to error to override the position it uses?

Why not add a look-ahead and check whether the subsequent character could continue the literal? That way you don't need to rewind at the end.

rbuckton · 2017-11-29T20:15:24Z

src/compiler/scanner.ts

+                    start = pos;
+                    continue;
+                }
+                if (isDigit(ch)) {


Considering the similarities between this, scanHexDigit, and scanBinaryOrOctalDigits perhaps we should unify (at least in part) these three functions by having scanNumberFragment take a callback for isDigit and then calling it as scanNumberFragment(isDigit), scanNumberFragment(isHexDigit), scanNumberFragment(isBinaryDigit), and scanNumberFragment(isOctalDigit).

scanBinaryOrOctalDigits and scanHexDigit also do arithmetic to calculate the actual numeric value, however (rather than relying on the host being able to parse the literal correctly). I would also need to pass thru another callback to retain that behavior, too.

scanHexDigit also has to deal with validating that there are a certain number of digits and have an option to not recognize separators, for when its used while handling escape sequences. IMO, while the general structure is similar, they're distinct enough to not warrant a shared base.

Alternatively, you can pass in a base of type 2 | 8 | 10 | 16 like in scanBinaryOrOctalDigits and special case hex characters for base 16?

Worth keeping focus on performance when parsing numbers, even at a cost of flexibility. Too hot a code path.

rbuckton · 2017-11-29T20:18:27Z

src/compiler/utilities.ts

        // If we don't need to downlevel and we can reach the original source text using
        // the node's parent reference, then simply get the text as it was originally written.
-        if (!nodeIsSynthesized(node) && node.parent) {
+        if (!nodeIsSynthesized(node) && node.parent && !(isNumericLiteral(node) && node.numericLiteralFlags & TokenFlags.ContainsSeparator)) {


Should we add an es2018 target now or wait until the ES2018 spec is official?

rbuckton · 2017-11-29T20:21:32Z

tests/baselines/reference/parser.numericSeparators.binaryNegative.errors.txt

+==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/3.ts (3 errors) ====
+    0_B0101
+     ~
+!!! error TS6188: Numeric seperators are not allowed at the end of a literal.


The error message isn't clear considering the context. Perhaps it should be Numeric separators are not allowed here.

rbuckton · 2017-11-29T20:22:32Z

src/compiler/diagnosticMessages.json

        "category": "Message",
        "code": 6187
    },
+    "Numeric seperators are not allowed at the end of a literal.": {


rbuckton · 2017-11-29T20:24:26Z

tests/baselines/reference/parser.numericSeparators.binaryNegative.errors.txt

+
+!!! error TS1177: Binary digit expected.
+      ~~~~
+!!! error TS2304: Cannot find name '_110'.


Should we consider just parsing out the rest of the number rather than just stop? Then we'd avoid the excess Cannot find name errors.

The proposed grammar says the production is effectively Digit Sep Digit, so we'd be much more lenient than is needed. Which seems fine; I don't think 10__01 can be treated as anything other than a number with too many separators.

rbuckton · 2017-11-29T20:26:00Z

tests/baselines/reference/parser.numericSeparators.binaryNegative.errors.txt

+!!! error TS2304: Cannot find name 'B0101'.
+
+==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/4.ts (3 errors) ====
+    0b01__11


As above, after we report the numeric separator error, we may want to continue to parse out the rest of the number to avoid excessive unrelated errors, though I'd like to get @mhegazy's or @DanielRosenwasser's opinion on that.

rbuckton · 2017-11-29T20:38:18Z

tests/cases/conformance/parser/ecmascriptnext/numericSeparators/parser.numericSeparators.hex.ts

+0x00_11;
+0X0_1;
+0x1100_0011;
+0X0_11_0101;


Per tc39/proposal-numeric-separator#25 it looks like "\u{10_ffff}" is permitted as an extended Unicode escape sequence. Can you add this to your existing tests?

Discussion on tc39/proposal-numeric-separator#25 has continued and it is apparently now an explicit syntax error to witness an _ in a literal in an escape sequence.

DanielRosenwasser · 2017-11-29T21:10:17Z

src/compiler/scanner.ts

                }
+            }
+            if (tokenFlags & TokenFlags.ContainsSeparator) {
+                if (decimalFragment && scientificFragment) {


What about just

let result = mainFragment; if (decimalFragment) { result += "." + decimalFragment; } if (scientificFragment) { result += "." + scientificFragment; } return "" + +mainFragment;

DanielRosenwasser · 2017-11-29T21:11:17Z

src/compiler/scanner.ts

        function scanHexDigits(minCount: number, scanAsManyAsPossible: boolean): number {
            let digits = 0;
            let value = 0;
+            let seperatorAllowed = false;


separatorAllowed

DanielRosenwasser · 2017-11-29T21:11:35Z

src/compiler/scanner.ts

            // For counting number of digits; Valid binaryIntegerLiteral must have at least one binary digit following B or b.
            // Similarly valid octalIntegerLiteral must have at least one octal digit following o or O.
            let numberOfDigits = 0;
+            let seperatorAllowed = false;


weswigham · 2017-11-29T22:40:32Z

@rbuckton I think I've hit on all your comments except factoring scanNumberFragment to be a common base to all the number-scanners (which, IMO, isn't worth it with all the specialized functionality the hex scanner needs that the other two do not).

It's probably worth noting that I included regexp literals with the u flag in my unicode escape test; we don't currently support them (rather, we don't downlevel them at all), so none of them have errors flagged, but I figure it would be good to include them as a reference.

weswigham · 2017-12-08T19:02:11Z

@mhegazy

Benchmark	Name	Iterations
Current	numeric-seperators	10
Baseline	master	10

Metric	master	numeric-seperators	Delta	Best	Worst
Compiler - Unions - node (v9.0.0, x64)
Memory used	205,991k (± 0.01%)	206,039k (± 0.01%)	+49k (+ 0.02%)	205,981k	206,092k
Parse Time	0.68s (± 4.36%)	0.65s (± 0.68%)	-0.02s (- 3.69%)	0.64s	0.66s
Bind Time	0.64s (± 3.92%)	0.62s (± 1.52%)	-0.01s (- 2.19%)	0.61s	0.65s
Check Time	11.35s (± 3.83%)	10.93s (± 0.49%)	-0.42s (- 3.70%)	10.85s	11.10s
Emit Time	0.00s (± 0.00%)	0.00s (± 0.00%)	0.00s ( NaN%)	0.00s	0.00s
Total Time	12.67s (± 3.80%)	12.21s (± 0.46%)	-0.46s (- 3.62%)	12.13s	12.39s
Monaco - node (v9.0.0, x64)
Memory used	333,101k (± 0.01%)	333,125k (± 0.02%)	+24k (+ 0.01%)	333,005k	333,269k
Parse Time	1.59s (± 4.24%)	1.53s (± 1.80%)	-0.05s (- 3.15%)	1.50s	1.64s
Bind Time	0.68s (± 5.63%)	0.67s (± 4.55%)	-0.01s (- 1.61%)	0.63s	0.76s
Check Time	3.58s (± 2.46%)	3.48s (± 1.02%)	-0.09s (- 2.63%)	3.40s	3.54s
Emit Time	2.46s (± 4.36%)	2.35s (± 3.21%)	-0.11s (- 4.39%)	2.29s	2.56s
Total Time	8.31s (± 2.64%)	8.05s (± 0.88%)	-0.26s (- 3.13%)	7.94s	8.23s
TFS - node (v9.0.0, x64)
Memory used	289,271k (± 0.01%)	289,275k (± 0.01%)	+3k (+ 0.00%)	289,187k	289,356k
Parse Time	1.09s (± 1.46%)	1.10s (± 2.28%)	+0.01s (+ 0.64%)	1.07s	1.19s
Bind Time	0.53s (± 0.89%)	0.54s (± 2.49%)	+0.01s (+ 2.08%)	0.52s	0.59s
Check Time	3.10s (± 0.61%)	3.15s (± 1.73%)	+0.05s (+ 1.58%)	3.11s	3.37s
Emit Time	2.05s (± 1.43%)	2.08s (± 1.77%)	+0.03s (+ 1.42%)	2.03s	2.22s
Total Time	6.78s (± 0.56%)	6.87s (± 1.50%)	+0.10s (+ 1.42%)	6.77s	7.24s

Perf shows little to no change (erring towards slight parse time improvement). Should we merge?

DanielRosenwasser

Looks good. Ideally, we'd continue parsing Unicode escapes with separators, and give a better error on consecutive separators, but that can be done later.

DanielRosenwasser · 2017-11-29T21:11:37Z

src/compiler/scanner.ts

+            let seperatorAllowed = false;
            while (true) {
                const ch = text.charCodeAt(pos);
+                // Numeric seperators are allowed anywhere within a numeric literal, except not at the beginning, or following another seperator


DanielRosenwasser · 2017-11-29T21:13:48Z

src/compiler/scanner.ts

+                break;
+            }
+            if (text.charCodeAt(pos - 1) === CharacterCodes._) {
+                pos--;


Why not add a look-ahead and check whether the subsequent character could continue the literal? That way you don't need to rewind at the end.

DanielRosenwasser · 2017-12-08T22:25:01Z

src/compiler/scanner.ts

+                        result += text.substring(start, pos);
+                    }
+                    else {
+                        error(Diagnostics.Numeric_separators_are_not_allowed_here, pos, 1);


"Multiple consecutive numeric separators are not permitted."

ECMAScript numeric separator proposal is now on stage3 https://github.com/tc39/proposal-numeric-separator and will come to TypeScript at version 2.7. microsoft/TypeScript#20324 Example: ```typescript const i = 123_456; const x = 0x1fa_EF6; const f = 123_456.0_456; const f2 = 123_456.4_5e1_2; ```

weswigham added 3 commits November 28, 2017 17:47

Add support into octal and binary literals

079b96e

Add hex support

e214d98

And finally support all numeric literals and fix spelling

d980e4a

weswigham requested review from mhegazy and rbuckton November 29, 2017 03:09

mhegazy reviewed Nov 29, 2017

View reviewed changes

rbuckton suggested changes Nov 29, 2017

View reviewed changes

rbuckton reviewed Nov 29, 2017

View reviewed changes

DanielRosenwasser reviewed Nov 29, 2017

View reviewed changes

ajafff mentioned this pull request Nov 29, 2017

Support numeric separators ajafff/tsutils#15

Closed

weswigham added 5 commits November 29, 2017 13:23

Update error message

60d8bdf

Refactor error in scanner to take a position

771a1d6

Scan no separators in escape sequences, add escape sequence tests

cd0777a

More decimal tests from the spec presentation examples

cc84c0b

Permissive scanning of excess separators

7ac914d

weswigham added 2 commits November 29, 2017 14:42

Remove unnecessary assignment

11dddbb

Make code easier to follow

c2b81ec

rbuckton approved these changes Nov 30, 2017

View reviewed changes

Merge branch 'master' into numeric-seperators

ec4540d

DanielRosenwasser reviewed Dec 8, 2017

View reviewed changes

DanielRosenwasser mentioned this pull request Dec 8, 2017

Give better error message on consecutive numeric separators #20585

Closed

weswigham merged commit 9e51882 into microsoft:master Dec 9, 2017

weswigham deleted the numeric-seperators branch December 9, 2017 01:20

This was referenced Dec 26, 2017

TypeScript 2.7: exclamation marks in property declarations aren't parsed prettier/prettier#3561

Closed

TypeScript 2.7: numeric separators aren't parsed prettier/prettier#3578

Closed

rhysd mentioned this pull request Jan 11, 2018

Support numeric separator (i.e. 123_456) leafgarland/typescript-vim#129

Merged

microsoft locked and limited conversation to collaborators Jun 14, 2018

Conversation

weswigham commented Nov 29, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhegazy commented Nov 29, 2017

Uh oh!

rbuckton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rbuckton Nov 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weswigham commented Nov 29, 2017

Uh oh!

weswigham commented Dec 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DanielRosenwasser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

rbuckton Nov 29, 2017 •

edited

Loading

weswigham commented Dec 8, 2017 •

edited

Loading